Abstract

TODO write abstract

Table of contents

Background

Data

Analysis

Supplemental


Introduction

Last update: May 2013

Radiation protection standards are based primarily upon observations of atomic bomb survivors supplemented by animal studies ??BEIR VII source??. The animal studies are used, amoung other things to estimate the effects of protraction, i.e. would the dose delivered by the atomic bombs have induced as many health effects if it were delivered more slowly and spread across a broader population like modern radiation exposures?

It is generally ??linear quadratic citations??, though not unanimously ??disenting citations??, assumed that the dose response to low-LET radiation has a linear-quadratic form. Concretely:

risk ~ a * Dose + B * Dose^2

This formula is based on observations of cellular systems which regularly show a linear quadratic response to radiation. Specifically, log(cell viability) and gross chromosomal abberations show a linear quadratic response to radiaiton in most cellular systems ??citation??.

Mechanistically this is explained by DNA damage and repair. A single track of low-LET radiation may lead to inviabilty or

TODO: why does cell viability need a log opperation to be linear quadratic where chromosomal abberations do not? Is it possible that mortality also needs a log transformation?

Health effects, especially carcinogenesis rates, observed in the atomic bomb survivors are divided by a dose and dose rate effectiveness factor (DDREF) to estimate the health effects that would be observed if the same dose had been protracted.

TODO fill in ?? above

TODO(later) write an abstract TODO(later) read and spruce up this doc

Defining DDREF

Last update: April 2013

What is DDREF?

DDREF is defined ambiguosly. It can be derived from acute exposures as 1 + Dβ/α where response ~ Dα + Dβ^2. But for the purposes of radiation protection, we need a functional definition, that distinguishes a cutoff of dose and doserate beyond which the DDREF correction ought to be used. What are these cutoffs? I will do a literature search to find out.

Notes

ICRP 2007 3.2.1 (70-73)
DDREF is 2 based on LLS dose response curves, ‘experimental data’, and ‘probabilistic uncerainty analysis’ conducted by others (NCRP, 1997, EPA, 1999, NCI/CDC, 2003, Annex A).

ICRP 2007 A.3.1 (A 62)
When dose rates are lower than around 0.1 Gy/hour there is repair of cellular radiation injury during the irradiation. This causes the b component to decrease and to reach zero at very low dose rates. The a component is not modifiable by changing dose rate.

BEIR VII Chapter 2 - How B decreases with dose rate (Edwards and others 1989) - Estimate DDREF from animal data (Tucker and others 1998) - More DDREF from animals data (Lorenz and others 1994) - More DDREF in animals studies (Ullrich and Storer 1979a; Ullrich and others 1987)

BEIR VII Chapter 10
DREF is the term for dose rate reduction, as opposed to both dose and dose rate reduction

  • pg 246 claims that up to 24 hours seems to be required for full repair from a single dose
  • pg 248 Says LSS DDREF are roughly equivilant to UNSCEAR DDREF estimated at 1 Sv of exposure.
  • pg 250 used a cutoff of 1.5 - 2 Gy for animal data to avoid leveling off effects!!!!!!
  • pg 254 when fractionated, dose response can be described as aD + B(D^2/K) where K is the number of fractions
  • pg 255 they were forced to use mean survival times, instead of lifespan, because they did not have individual level data! They admit this is problematic, something I can address!
  • pg 255 importantly they ignore data that does not account for competing sources of risk. I do not know if I can do this! But perhaps this is rather irrelevant in the case of lifespan most of these studies
  • pg 255 They only used the accute exposures from Edwards (1992) excluding tables 1 and 2

References

  • Edwards 1989, Chromosome aberrations in human lymphocytes
  • Edwards 1992, Low Dose and Low Dose Rate Effects in Laboratory Animals
  • Lorenz 1994, Dose and dose-rate dependence of the frequency of hprt deficient T lymphocytes in the spleen of the 137Cs gamma-irradiated mouse
  • Tucker 1998, The accumulation of chromosome aberrations and Dlb-1 mutations in mice with highly fractionated exposure to gamma radiation
  • Ullrich 1979a, Influence of gamma irradiation on the development of neoplastic disease in mice. I. Reticular tissue tumors
  • Ullrich 1987, Myeloid leukemia in male RFM mice following irradiation with fission spectrum neutrons or gamma rays.

Conclusions

  1. Survival hazard was not used in BEIR VII, only mean lifespan.
  2. BEIR VII used 1.5 Sv as an upper threshold, because response falls off above this level, I should too.
  3. When doses are fractionated apply the equation: a*D + B*(D^2/K), where K is the number of fractions
  4. Dealing with doserate changes is hard, pg 246 suggests that repair processes take up to 24 hours, so this might be a natural break point.
  5. Estimate DDREF at 1 Sv to make it compatible with lss estimates

^ back to table of contents


Log likelihood

I need to know how log likelihood is calculated. Basically it should be proportional to

sum((actual - predicted)^2 / variance)

And variance is measured as

variance = sum((actual - predicted)^2)) / (n - p - 1)

If variance is estimated at zero, this goes to infinity, so the overfit regression model has problems.

data <- data.frame(
    y=1:100,
    x=rnorm(100, 1:100)
)
n <- nrow(data)
m <- glm(y ~ x, data=data)
data$p <- predict(m)
data$e <- data$y - data$p
o2 = with(data, 
    sum(e^2) / (n)
)
o2
## [1] 1.224
l <- with(data, 
    -(n/2) * log(2*pi) +
    -(n/2) * log(o2) +
    -(1/(2*o2)) * sum(e^2)
)
l - as.numeric(logLik(m))
## [1] 0

^ back to table of contents


DATA: Data Funnel

Last update: April 2013

Introduction:

Create a data set that might be used for DDREF analysis and make a description of this data.

# Common functions
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# DATA
d <- readRDS('data/external5.rds')

# Report for funnel graph
count <- function(data){
  count_unique <- function(x) length(unique(x))
  with(data, c(
    studies      = count_unique(file),
    clusters     = count_unique(cluster),
    treatments   = count_unique(group_id),
    animals      = count_unique(id),
    'not vetted' = count_unique(id) - count_unique(id[is_vetted]),
    'to exclude' = count_unique(id[exclude])
  ))
}

filter_by_n_groups <- function(data, threshold=3){
  ddply(data, .(cluster), function(df){
    n_groups = length(unique(paste(df$dose, 
                                   df$dose_rate, 
                                   df$fractions)))
    if(n_groups >= threshold){
      return(df)
    } else {
      return(NULL)
    }
  })
}

# Define            
bad_qualities <- c(
  'accel. alpha local',
  'accel. alpha whole body',
  'accel. neutrons 0.1-10 MeV',
  'neutrons 1-10 MeV',
  'neutrons C-252',
  'neutrons fission',
  'neutrons>10 MeV',
  'X-rays local', 
  'gamma-rays local',
  'Bremsstrahlung > 3MeV.'
)

# Aliases
# Allow a more concise representation of a name.
# For instace ♂ is preferable to Male
aliases <- list(
  'quality'= c(
    'gamma-rays Co-60'='γ-ray',
    'gamma-rays Co-60, gamma-rays Co-60'='γ-ray',
    'gamma-rays Co-60, gamma-rays Co-60, gamma-rays Co-60'='γ-ray',
    'gamma-rays Cs-137'='γ-ray',
    'gamma-rays whole body'='γ-ray',
    'gamma-rays'='γ-ray',
    'X-rays whole body'='X-ray'
  ),
  'sex'= c(
    Both='♂/♀',
    Female='♀',
    Male='♂'
  ),
  'lab'= c(
    '2'='CEN-FAR',
    '3'='ENEA',
    '9'='SCK/CEN',
    '11'='TNO',
    '1002'='DAVIS',
    '1003'='ANL',
    '1005'='ITRI',
    '1007'='ORNL',
    '1008'='CSU'
  ),
  'strain'=c(
    'beagle'='Beagle'
  )
)


threshold_dose <- 1.5

# Fix
# TODO(later) move these fixes to radiation.R

# Stray fixes
## Some of the B6CF1 mice are missing their species
d[d$strain == 'B6CF1', 'species'] <- 'Mouse'
## One animal is listed as a control despite having a dose, remove her
contradictory_dose_and_quality <- (d$quality == 'none (controls)' & 
                                     d$dose != 0 & 
                                     !is.na(d$dose) &
                                     !is.na(d$quality))
d <- d[!contradictory_dose_and_quality,]

# NA doses
d$dose[is.na(d$dose)] <- 0
d$dose_rate[is.na(d$dose_rate)] <- 0
d$fractions[d$dose == 0] <- 0
d$dose_rate[d$dose == 0] <- 0

# NA quality
d$quality[is.na(d$quality)] <- 'none (controls)'

# Add missing fractions
d$fractions[is.na(d$fractions)] <- 1
d$fractions[d$fractions == 0] <- 1

# Add fractions seperated by days
d$day_fractions <- d$fractions
s <- d$fraction_interval < 1 & !is.na(d$fraction_interval)
d$day_fractions[s] <- d$fractions[s] * d$fraction_interval[s]

# Add lab
d$lab <- sub('(^[0-9]*).*$', '\\1', d$study_id)

# If assignment age is not listed, assume it is zero
d$assignment_age[is.na(d$assignment_age)] <- 0

# If an animal is a control, it should have no age at treatment
# Nor an age at last treatment
d$age_at_treatment[d$dose == 0] <- NA
d$age_at_last_treatment[d$dose == 0] <- NA

# Age at last treatment
d$age_at_last_treatment <- d$age_at_treatment
s <- !is.na(d$fraction_interval)
d$age_at_last_treatment[s] <- with(d[s,], 
                                   age_at_treatment + 
                                   fraction_interval * (fractions - 1),
)

# Assign aliases
# Replace all values in a given column with their aliases
# e.g. replace gamma-rays with γ
for(column in names(aliases)) {
  for(name in names(aliases[[column]])) {
    alias = aliases[[column]][name]
    d[d[column] == name & !is.na(d[column]),column] <- alias
  }
}

Define Clusters

In general we want to cluster on:

lab, species, strain, and sex

d$cluster = with(d, paste(sex, 
                          strain, 
                          species, 
                          lab, sep='--'))

But also

`assignment_age and quality`

Which require special consideration.

Intended Assignment Age

Assignment age was usually recorded ‘as intended’. So that all mice in a group have an assignment age of 56 days old. Such precision is dubious/impossible and most likely represents a reconstruction based on the methods described about the experiment.

By contrast, argonne data recorded true age at treatment assignment so that animals vary by up to 50 days within a single cluster. These animals should not be divided into seperate clusters, because they are all adults. By contrast animals irradiated at -4 days and 7 days old should be put into seperate age clusters because they represent very different stages of development.

The most complete way to handle this situation is do define a lifestage by age for each species and use this for clustering. But this approach is arbitrary, contrived, and needlessly complex.

Instead we will define a new feild, approximate_assignment_age. For most groups this will be the reported assignment_age. For argonne groups we will define it by the median assignment_age.

d$intended_assignment_age <- d$assignment_age
labs_that_recorded_true_age_at_assignment <- c(
  'ANL'
)
clusters_that_recorded_true_age_at_assignment = unique(
  d$cluster[
    d$lab %in% labs_that_recorded_true_age_at_assignment
])
for(c in clusters_that_recorded_true_age_at_assignment) {
  d$intended_assignment_age[d$cluster == c] <- 
    median(d$assignment_age[d$cluster == c])
}
d$cluster <- paste(d$cluster, d$intended_assignment_age, sep='--')
Duplicate controls

Control animals may control for multiple clusters. For example the same mouse could control for a group exposed to gamma rays and others exposed to x-rays. Therefore control groups ought to be duplicated and included in each cluster that they might control for.

Concretely, control animals should match sex, species, strain, assignment age, and lab. Controls should be duplicated for each unique quality and age of first exposure provided the aforementioned criteria are met.

d <- ddply(d, .(cluster), function(df) {
  control   <- df[df$quality == 'none (controls)',]
  treatment <- df[df$quality != 'none (controls)',]
  
  # Create a cluster for each non control intended age of treatment
  ddply(treatment, .(quality), function(treatment_group){
        
    # Add control to each treatment group
    df2 <- rbind(treatment_group, control)
    
    # Define quality by the treatment group
    # i.e. 
    #    quality = 'none (control)'  ->  quality = 'gamma'
    df2$quality <- treatment_group$quality[1]
    
    # Add quality to the cluster name
    df2$cluster <- with(df2, paste(cluster, 
                                   quality,
                                   sep='--'))    
    df2
  })
})

# Label the duplicates
d$duplicates <- duplicated(d$id)

# Show that all of them recieved 0 dose
with(d, all(dose[duplicates] == 0))
## [1] TRUE
# Count the number of duplicates
table(d$duplicates)  # 23657
## 
##  FALSE   TRUE 
## 116542  23657
# FILTER

# Initial counts
count(d)        
##    studies   clusters treatments    animals not vetted to exclude 
##         24         82        827     116542      60118      15465
# Only low-LET, whole body
d <- d[!d$quality %in% bad_qualities,]     
count(d)        
##    studies   clusters treatments    animals not vetted to exclude 
##         24         43        457      76096      23213      15465
# Dose below threshold (as in BEIR VII)
d <- d[!(d$dose > threshold_dose),] 
count(d)        
##    studies   clusters treatments    animals not vetted to exclude 
##         23         35        230      45730       6220       9662
# Lifespan not NA
d <- d[!is.na(d$lifespan),]
count(d)
##    studies   clusters treatments    animals not vetted to exclude 
##         23         35        230      45722       6220       9662
# No other treatments
d <- d[d$other_treatments == 'none',]
count(d)
##    studies   clusters treatments    animals not vetted to exclude 
##         21         34        175      43043       4832       8516
# Died before their 'assignment age'
# TODO(later) why should there be any mice that died before their assignemtn age?
d <- d[d$lifespan > d$assignment_age,]
count(d)
##    studies   clusters treatments    animals not vetted to exclude 
##         21         34        175      42948       4797       8509
# Remove those that should be excluded
exclusions <- sort(unique(d$reason))
exclusions <- exclusions[exclusions != ""]
for(ex in exclusions){
  d <- d[!d$reason == ex,]
  print(ex)
  print(count(d))
}
## [1] "see exclusion-1 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##         21         34        174      42941       4797       8502 
## [1] "see exclusion-10 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##         21         33        142      40906       4797       6467 
## [1] "see exclusion-11 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##         20         32        138      39803       4797       5364 
## [1] "see exclusion-12 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##         20         32        137      39506       4797       5067 
## [1] "see exclusion-13 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##         19         32        136      39440       4797       5001 
## [1] "see exclusion-5 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##         18         31        133      39419       4797       4980 
## [1] "see exclusion-7 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##         18         31        132      35706       4797       1267 
## [1] "see exclusion-8 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##         17         31        129      34997       4797        558 
## [1] "see exclusion-9 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##         16         27        119      34439       4797          0
# Remove cases with few treatment groups    
d <- filter_by_n_groups(d)
count(d) 
##    studies   clusters treatments    animals not vetted to exclude 
##          8         17         95      29072          0          0
#    studies   clusters treatments    animals not vetted to exclude 
#          8         17         95      29072          0          0 

# How many duplicates after filtering?
table0(d$duplicates)  # 19462 FALSE 9610 TRUE
## 
## FALSE  TRUE 
## 19462  9610
# Warnings
# show, but do not remove
warnings <- sort(unique(d$warning_reason))
warnings <- warnings[warnings != ""]
d_wo_warnings <- d
for(w in warnings){
  d_wo_warnings <- d_wo_warnings[
    !d_wo_warnings$warning_reason == w,
    ]
  print(w)
  print(count(d_wo_warnings))
}
## [1] "see warning-1 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##          7         15         83      20639          0          0 
## [1] "see warning-2 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##          7         15         82      20209          0          0 
## [1] "see warning-3 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##          7         15         80      19905          0          0 
## [1] "see warning-4 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##          6         14         77      19658          0          0 
## [1] "see warning-5 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##          6         14         76      19305          0          0 
## [1] "see warning-6 in radiation.R"
##    studies   clusters treatments    animals not vetted to exclude 
##          6         14         75      18995          0          0
d_wo_warnings <- filter_by_n_groups(d_wo_warnings)
count(d_wo_warnings)
##    studies   clusters treatments    animals not vetted to exclude 
##          6         13         73      18903          0          0
#    studies   clusters treatments    animals not vetted to exclude 
#          6         13         73      18903          0          0 

# Save
setwd('~/janus')
saveRDS(d, 'data/funneled.rds')

^ back to table of contents


Clean

DDREF data has been filtered. Now its time to prettify it.

# Common functions
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
setwd('~/janus')
d <- readRDS('data/funneled.rds')

# Is acute?
d$acute <- d$fractions == 1 | d$dose == 0
d$protracted <-  d$dose > 0 & 
                 (d$fractions > 1 |
                  d$dose_rate < 0.01)

# Observations per cluster
d = d %>%
  group_by(cluster) %>%
  mutate(n_in_cluster=length(cluster))

# Give the clusters pretty names
for(c in unique(d$cluster)) {
  elements = as.list(strsplit(c, '--')[[1]])
  names(elements) <- c('sex', 'strain', 'species', 'lab', 'age', 'quality')
  pretty_cluster = with(elements, paste(
    sex, strain, pluralize(species), lab,
    '\n', quality, 'at', age, 'days old'))
  d$cluster[d$cluster == c] <- pretty_cluster
}

# Define Acute
chronic <- d$fractions > 1
d$type <- 'A'
d$type[chronic] <- 'C'

# Order clusters
# By number of observations.  This will put the cluster with
# the most observations first in ggplots
sort_by_n <- function(x) {
  factor(x,
         levels = names(sort(table(x), decreasing=TRUE)))
}
d$cluster = sort_by_n(d$cluster)
d$cluster_id <- as.numeric(d$cluster)
d$cluster <- with(d, paste0(cluster_id, ' - ', cluster))
d$cluster = sort_by_n(d$cluster)

# Save Data for later use
saveRDS(d, 'data/ddref.rds')
write.csv(d,file='data/ddref.csv')

Results

Data is so fresh and so clean.

Concordance

Last update: April 2013

Give a detailed description of the dataset for those that want to make a close inspection.

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
setwd('~/janus')
d <- readRDS('data/ddref.rds')

# Helpers
group_summary <- function(data){
    .all <- function(x, c=' ') paste(unique(x), collapse=c)
    summary <- with(data, c(
        ' ♂ '          = sum(sex == '♂'),
        ' ♀ '          = sum(sex == '♀'),
        'avg. age'      = round(mean(lifespan)),
        'dose'          = .all(signif(dose, 2)),
        'rate'          = .all(signif(dose_rate, 1)),
        '# fractions'   = .all(fractions),
        warnings        = .all(gsub('[^0-9]', '', warning_reason))
    ))
    summary[summary == 0] <- '-'
    summary[is.na(summary)] <- '-'  
    summary
}

find_in_file <- function(pattern, file='~/janus/scripts/exp/radiation.R'){
        lines <- readLines(file)
        lines[grepl(pattern, lines)]
}

Group details

Show the details of each treatment group in the dataset organized by cluster

for(id in sort(unique(d$cluster_id))) {
  df <- d[d$cluster_id == id,]
  cluster <- as.character(df$cluster[1])
  cat('\n', 
      cluster, 
      '\n------------------------------------------------\n')
  s <- ddply(df, .(group_id), function(df) {
    group_summary(df)
  })
  s <- s %>% 
    mutate(study=sub('^[0-9]*-', '', group_id),
           group=as.numeric(sub('^[0-9]*-', '', study)),
           study=as.numeric(sub('-[0-9]*$', '', study))) %>%
    arrange(study, group) %>%
    select(-study, -group)
  print(s, row.names=FALSE)
}
## 
##  1 - ♀ RFM/Un Mice ORNL 
##  γ-ray at 70 days old 
## ------------------------------------------------
##   group_id  ♂    ♀  avg. age dose rate # fractions warnings
##   1007-3-9   - 2696      632  0.1  0.4           1        1
##  1007-3-10   -  930      614 0.25  0.4           1        1
##  1007-3-11   - 1064      553  0.5  0.4           1        1
##  1007-3-12   -  237      541 0.75  0.4           1        1
##  1007-3-13   - 1045      538    1  0.4           1        1
##  1007-3-14   - 1005      487  1.5  0.4           1        1
## 
##  2 - ♀ B6CF1 Mice ANL 
##  γ-ray at 114 days old 
## ------------------------------------------------
##   group_id  ♂    ♀  avg. age dose  rate # fractions warnings
##  1003-20-2   -  857      945    -     -           1         
##  1003-20-4   -  397      903 0.86  0.04           1         
##  1003-21-2   -  185      932    -     -           1         
##  1003-21-4   -  200      936 0.86  0.04           1         
##  1003-22-2   -  464      970    -     -           1         
##  1003-24-2   -  175      991    -     -           1         
##  1003-25-2   -   50      960    -     -           1         
##  1003-26-2   - 1138      978    -     -           1         
##  1003-26-3   -  497      963 0.22  0.01           1         
##  1003-26-4   -  346      968 0.43  0.02           1         
##  1003-26-5   -  194      935 0.86  0.04           1         
##  1003-29-2   -  584      986    -     -           1         
##  1003-29-4   -  598      957    1 8e-04          60         
##  1003-30-2   -  399      977    -     -           1         
## 
##  3 - ♂ B6CF1 Mice ANL 
##  γ-ray at 113 days old 
## ------------------------------------------------
##   group_id  ♂   ♀  avg. age dose  rate # fractions warnings
##  1003-20-1 843   -      952    -     -           1         
##  1003-20-3 386   -      922 0.86  0.04           1         
##  1003-21-1 200   -      985    -     -           1         
##  1003-21-3 199   -      970 0.86  0.04           1         
##  1003-21-5 160   -      939  1.4  0.07           1         
##  1003-22-1 557   -      985    -     -           1         
##  1003-24-1 310   -      987    -     -           1        6
##  1003-25-1  60   -     1011    -     -           1         
##  1003-26-1 200   -     1043    -     -           1         
##  1003-28-1 120   -     1020    -     -           1         
##  1003-29-1 592   -      993    -     -           1         
##  1003-29-3 594   -      971    1 8e-04          60         
##  1003-30-1 393   -     1007    -     -           1         
## 
##  4 - ♂ C57BL/Cnb Mice SCK/CEN 
##  γ-ray at 84 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##     9-6-1 467   -      613    -    -           1         
##     9-6-2 241   -      581 0.25  0.3           1         
##     9-6-3 236   -      564  0.5  0.3           1         
##     9-6-4 241   -      550    1  0.3           1         
##     9-6-8 107   -      605 0.25  0.3          10         
##     9-6-9 109   -      604  0.5  0.3          10         
##    9-6-10 115   -      615    1  0.3          10         
##    9-6-14 104   -      622    1  0.3           8         
## 
##  5 - ♂ RFM/Un Mice ORNL 
##  γ-ray at 70 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##  1007-3-1 430   -      711    -    -           1        1
##  1007-3-2 256   -      720  0.1  0.4           1        1
##  1007-3-3  94   -      711 0.25  0.4           1        1
##  1007-3-4 247   -      680  0.5  0.4           1        1
##  1007-3-5 230   -      673    1  0.4           1        1
##  1007-3-6 199   -      651  1.5  0.4           1        1
## 
##  6 - ♂ BALB/c/Cnb Mice SCK/CEN 
##  γ-ray at 84 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##     9-5-1 322   -      766    -    -           1         
##     9-5-2 191   -      745 0.25    4           1        3
##     9-5-3 194   -      736  0.5    4           1         
##     9-5-4 191   -      732    1    4           1         
##     9-5-8 111   -      778 0.25    4          10         
##     9-5-9 110   -      740  0.5    4          10         
##    9-5-10 113   -      751    1    4          10        3
## 
##  7 - ♀ BC3F1 Mice ENEA 
##  X-ray at 91 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##     3-1-1   - 353      889    -    -           1        5
##     3-1-2   - 100      912 0.04 0.06           1         
##     3-1-3   -  84      893 0.08 0.06           1         
##     3-1-4   -  53      854 0.16 0.06           1         
##     3-1-5   -  58      874 0.32 0.06           1         
##     3-1-6   -  57      833 0.64  0.6           1         
##     3-1-7   -  60      707  1.3  0.6           1         
##     3-1-9   - 279      865    -    -           1         
## 
##  8 - ♂ C57BL/6Bd Mice ORNL 
##  γ-ray at 70 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##  1007-2-2 502   -      906    -    -           1         
##  1007-2-4 254   -      909  0.5  0.4           1         
##  1007-2-6 260   -      922    1  0.4           1         
## 
##  9 - ♀ C3Hf/Bd Mice ORNL 
##  γ-ray at 70 days old 
## ------------------------------------------------
##   group_id  ♂   ♀  avg. age dose rate # fractions warnings
##   1007-2-9   - 501      778    -    -           1         
##  1007-2-11   - 249      727  0.5  0.4           1         
##  1007-2-13   - 250      693    1  0.4           1         
## 
##  10 - ♀ C57BL/6Bd Mice ORNL 
##  γ-ray at 70 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##  1007-2-1   - 491      858    -    -           1         
##  1007-2-3   - 253      855  0.5  0.4           1         
##  1007-2-5   - 251      865    1  0.4           1         
## 
##  11 - ♂ C3Hf/Bd Mice ORNL 
##  γ-ray at 70 days old 
## ------------------------------------------------
##   group_id  ♂   ♀  avg. age dose rate # fractions warnings
##  1007-2-10 502   -      732    -    -           1         
##  1007-2-12 244   -      713  0.5  0.4           1         
##  1007-2-14 248   -      721    1  0.4           1         
## 
##  12 - ♂ leucopus Peromyscus ANL 
##  γ-ray at 137 days old 
## ------------------------------------------------
##   group_id  ♂   ♀  avg. age   dose  rate # fractions warnings
##  1003-27-1 210   -     1388      -     -           1         
##  1003-27-2 203   -     1461      -     -           1         
##  1003-27-3 189   -     1358 0.0086 4e-04           1         
##  1003-27-4 181   -     1350  0.014 7e-04           1         
## 
##  13 - ♂ BC3F1 Mice ENEA 
##  X-ray at 92 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##    3-5-19 430   -      824    -    -           1        2
##    3-5-20  44   -      828  0.5  0.1           1         
##    3-5-21  48   -      797    1  0.1           1         
## 
##  14 - ♂ C57BL/Cnb Mice SCK/CEN 
##  X-ray at 7 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##     9-7-1 105   -      757    -    -           1        4
##    9-7-10  72   -      777  0.5    1           1        4
##    9-7-11  70   -      810    1    1           1        4
## 
##  15 - ♂ BC3F1 Mice ENEA 
##  X-ray at -4 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##     3-5-1  34   -      853    -    -           1         
##     3-5-3  48   -      799  0.3  0.1           1         
##     3-5-5  61   -      822  0.9  0.1           1         
##     3-5-7  46   -      897  1.5  0.1           1         
## 
##  16 - ♀ BC3F1 Mice ENEA 
##  X-ray at -4 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##     3-5-2   -  39      866    -    -           1         
##     3-5-4   -  40      883  0.3  0.1           1         
##     3-5-6   -  44      850  0.9  0.1           1         
##     3-5-8   -  50      872  1.5  0.1           1         
## 
##  17 - ♂ BC3F1 Mice ENEA 
##  X-ray at 580 days old 
## ------------------------------------------------
##  group_id  ♂   ♀  avg. age dose rate # fractions warnings
##    3-5-35  41   -      886    -    -           1         
##    3-5-36  42   -      901  0.5  0.1           1         
##    3-5-37  43   -      874    1  0.1           1

Warnings

Some issues were found when digging through the input data that were not judged to be severe enough to exclude the data, but do exemplify deviations from our expectations. These are listed as follows.

# Warnings are listed in radiation.R on lines that start with
# the following prefix
prefix <- "# warning-"

# Get a list of all the warnings relevant to this dataset
warnings <- gsub('[^0-9]', '', unique(d$warning_reason))
warnings <- sort(as.numeric(warnings[warnings != ""]))

# Find the corresponding warning definition in radiation.R
warning_prefixes <- paste0(prefix, warnings)
for(p in warning_prefixes) cat(find_in_file(p), '\n')
## # warning-1: Mean lifespan in the groups in this study varied from those reported in the formal literature (table 1 of Ullrich 1979 - jstor.org/stable/pdfplus/3575012.pdf).  Two groups varied substantially as noted in exclusion-7.  The others varied by 0-8 days, always less than one standard deviation of the mean.  I searched dillegently for cause of death annotations which should be excluded to correct the figures, but I could not find any.  The problem seems to originate with the data source. 
## # warning-2: Group 3-5-19 has 430 animals with a mean lifespan of 824 days in the data but should only have 203 animals with a mean lifespan of 827 days according to 3576356.pdf table II.  The difference in mean lifespan is small and this is the only group with such a disparity, which makes me think its probably worth including regardless. 
## # warning-3: 9-5-2 and 9-5-10 the mean lifespans in the data (738 and 739) are not the same at those in 3575970.pdf table 1 (743 and 747).  These are the only two discrepiances and are both less than 10 days.  Probably ok. 
## # warning-4: mean lifespans in 9-7 are consistently off by 2 (both up and down) from those in 3579307.pdf table 1.  The disparity is small, much less than the standard deviation, but still a bit worrisome. 
## # warning-5: 3-1-1 has a lifespan two days higher in the data (889) than in 3577210.pdf table 1 (887).  This is the only problem in this dataset which makes me think that the data is correct and the original table is wrong. 
## # warning-6: 1003-24-1 Has a mean lifespan of 878 days in the data and 887 days in table 11 of anl-95-3.  The counts and standard errors are the same and the total size of the disprepiancy is small ~10 days.  I am guessing that the table contains a typo

Study Details

Here are a description of each of the original studies used in this analysis as provided by the ‘Gray Book’ (Gerber et al. 1996). Studies can easily be found by lab and study id which are the first two parts of the group id. Concretely, a group id of 1007-3-6 is the sixth groups from the third study conducted at lab 1007, Oak Ridge National Laboratory.

TODO(later) summarize this information in a table TODO(later) this information should be stored as data, not text

3-1

cluster 7 - ♀ BC3F1 Mice ENEA X-ray at 91 days old


More detail in (Covelli 1988)

3-5

cluster 13 - ♂ BC3F1 Mice ENEA X-ray at 92 days old
cluster 15 - ♂ BC3F1 Mice ENEA X-ray at -4 days old
cluster 16 - ♀ BC3F1 Mice ENEA X-ray at -4 days old
cluster 17 - ♂ BC3F1 Mice ENEA X-ray at 580 days old


More detail in (Covelli 1984)

9-5

cluster 6 - ♂ BALB/c/Cnb Mice SCK/CEN γ-ray at 84 days old


More detail in (Maisin 1983)

9-6

cluster 4 - ♂ C57BL/Cnb Mice SCK/CEN γ-ray at 84 days old



More detail in (Maisin 1988)

9-7

cluster 14 - ♂ C57BL/Cnb Mice SCK/CEN X-ray at 7 days old


More detail in (Maisin 1988)

1003-20

cluster 2 - ♀ B6CF1 Mice ANL γ-ray at 114 days old
cluster 3 - ♂ B6CF1 Mice ANL γ-ray at 113 days old


More detail in (Grahn 1995)

1003-21

cluster 2 - ♀ B6CF1 Mice ANL γ-ray at 114 days old
cluster 3 - ♂ B6CF1 Mice ANL γ-ray at 113 days old


More detail in (Grahn 1995)

1003-22 cluster (only controls)

cluster 2 - ♀ B6CF1 Mice ANL γ-ray at 114 days old
cluster 3 - ♂ B6CF1 Mice ANL γ-ray at 113 days old


More detail in (Grahn 1995)

1003-24 cluster (only controls)

cluster 2 - ♀ B6CF1 Mice ANL γ-ray at 114 days old
cluster 3 - ♂ B6CF1 Mice ANL γ-ray at 113 days old


More detail in (Grahn 1995)

1003-25 cluster (only controls)

cluster 2 - ♀ B6CF1 Mice ANL γ-ray at 114 days old
cluster 3 - ♂ B6CF1 Mice ANL γ-ray at 113 days old


More detail in (Grahn 1995)

1003-26 cluster

cluster 2 - ♀ B6CF1 Mice ANL γ-ray at 114 days old
cluster 3 - ♂ B6CF1 Mice ANL γ-ray at 113 days old


More detail in (Grahn 1995)

1003-27 cluster

cluster 12 - ♂ leucopus Peromyscus ANL γ-ray at 137 days old


More detail in (Grahn 1995)

1003-28 cluster (only controls)

cluster 2 - ♀ B6CF1 Mice ANL γ-ray at 114 days old
cluster 3 - ♂ B6CF1 Mice ANL γ-ray at 113 days old


More detail in (Grahn 1995)

1003-29 cluster

cluster 2 - ♀ B6CF1 Mice ANL γ-ray at 114 days old
cluster 3 - ♂ B6CF1 Mice ANL γ-ray at 113 days old


More detail in (Grahn 1995)

1003-30 cluster (only controls)

cluster 2 - ♀ B6CF1 Mice ANL γ-ray at 114 days old
cluster 3 - ♂ B6CF1 Mice ANL γ-ray at 113 days old


More detail in (Grahn 1995)

1007-2

cluster 8 - ♂ C57BL/6Bd Mice ORNL γ-ray at 70 days old cluster 9 - ♀ C3Hf/Bd Mice ORNL γ-ray at 70 days old cluster 10 - ♀ C57BL/6Bd Mice ORNL γ-ray at 70 days old
cluster 11 - ♂ C3Hf/Bd Mice ORNL γ-ray at 70 days old


More detail in (Storer 1988)

1007-3

cluster 1 - ♀ RFM/Un Mice ORNL γ-ray at 70 days old
cluster 5 - ♂ RFM/Un Mice ORNL γ-ray at 70 days old


More detail in (Ullrich 1979)

References

Gerber, Watson, Sugahara, Okada. International Radiobiology Archives of Long-Term Animal Studies I. Descriptions of Participating Institutions and Studies. 1996. link

^ back to table of contents


Visualize

Last Update: April 2014

Now that the data is reasonably clean, show what it looks like.

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
setwd('~/janus')
d <- readRDS('data/ddref.rds')
# TODO(later): figure out what is the cause of early effects in ORNL data based on the pathology codes released with that dataset.

ggplot(d,
       aes(lifespan, 
           color=dose,
           group=factor(paste(dose, 
                              dose_rate,
                              fractions)),
           y=..scaled..)) + 
  geom_density(adjust=2) +
  facet_wrap(~ cluster, scales='free') + 
  scale_colour_gradient(
    guide = guide_legend(title = "Dose (Gy)"),
    trans = "sqrt"
  ) + 
  geom_vline(
    aes(xintercept=intended_assignment_age),
    alpha=0.5
  ) +
  expand_limits(x = -4)

plot of chunk unnamed-chunk-12

Figure: Lifespan by dose and cluster

Density plots show the frequency, y-axis, of a given lifespan (in days), x-axis. Sepperate density curves are shown for each distinct dose and dose rate. The total dose delivered is labeled by color as shown in the figure legend. The mean age at first exposure is denoted by a gray vertical line on each graph.

Clusters are sepperated by facets and labeled with sex, strain, species, lab, quality of radiation, and mean age at first exposure (in days). The clusters are ordered by the number of animals in the cluster. The cluster with the most animals, Male RFM/Un Mice from ORNL, are at the top left and the cluster with the least animals is furthest to the right on the bottom row.

Figure: Directly comparable protracted exposures

Similar to the figure above, except that the data has been limited to those cases were there are two groups in the same cluster with the same dose, but differences in dose rate or number of fractions. Doses that were delivered more slowly are labeled as protracted (dotdash line). Facets of the graph are defined by cluster and total dose delivered. Controls are included for comparison.

Additional details

In cluster 4, doses were broken into 10 or 8 fractions seperrated by three hours or 1 day respectively. In cluster 6 doses were broken into 10 fractions sepperated by one day. Notably both of these experiments were conducted at the same laboratory (SCK/CEN) using the same irradiator (a CS-137 source).

*Note, ORNL also conducted experiments with directly comparable doses both acute and protracted, but individual level data is not avaiable for the acute exposures.

Observations

The effects of fractionation are decidely ambivilant. Cluster 4 shows signs of high DDREF, cluster 6 shows signs of a DDREF close to 1. The only substantial difference between these experiments is the strain used. Three facts might account for this observation, perhaps DDREF is strain specific, perhaps there were methodological errors in the way these studies were conducted, or perhaps random events are obscuring the true pattern.

d <- d %>%
  group_by(cluster, dose, dose_rate, fractions) %>%
  arrange(lifespan) %>%
  mutate(survival=rank(-lifespan) / length(lifespan))

ggplot(d, 
       aes(lifespan, 
           survival,
           color=dose,
           group=factor(paste(dose, dose_rate, fractions)))) + 
  geom_path() +
  facet_wrap(~ cluster, scales='free') + 
  scale_color_continuous(
    guide = guide_legend(title = "Dose (Gy)"),
    trans = "sqrt"
  ) + 
  geom_vline(
    aes(xintercept=intended_assignment_age),
    alpha=0.5
  ) +
  expand_limits(x = -4)

plot of chunk unnamed-chunk-14

Figure: Survival plots

As before, but presented as traditional survival plots instead. The y axis represents the fraction of animals alive a at a given time. Other elements are organizeds as before.

Figure: Relative survival plots

Like the survival plot above, expecept that survival group within the entire cluster is subtracted from survival within a particular group. The result is a graph which emphasizes the differences between survival. Groups with positive values (above y=0) are surviving longer than the cluster average, groups with negative values (below y=0) are surviving less time than the cluster average. In this graph groups with protracted exposures (multiple fractions or dose rates < 0.01 Gy/min)

Atomic bomb survivor data

Last update: April 2014

For comparison, let’s load data from the atomic bomb survivors, lss14, and see how lifespan changes as a function of dose in these populations using similar visualizations.

Acknowledgement: This report makes use of data obtained from the Radiation Effects Research Foundation (RERF), Hiroshima and Nagasaki, Japan. RERF is a private, non-profit foundation funded by the Japanese Ministry of Health, Labour and Welfare and the U.S. Department of Energy, the latter through the National Academy of Sciences.The conclusions in this report are those of the authors and do not necessarily reflect the scientific judgment of RERF or its funding agencies.

Please send a copy of any reprints that make use of these data to: Archives Unit, Library and Archives Section
Department of Information Technology
Radiation Effects Research Foundation
5-2 Hijiyama Park
Minami-ku Hiroshima, 732-0815 JAPAN

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Helper functions
get_max_map <- function(min_map, max=Inf) {
  max_map <- c(min_map[2:length(min_map)], max)
  names(max_map) <- names(min_map)
  max_map
}
get_mean_map <- function(min_map, max=Inf) {
  mean_map <- (min_map + get_max_map(min_map, max)) / 2
  mean_map
}

# Load data
setwd('~/janus/')
data <- read.csv('data/lss14/lss14.csv')

# Define Terms
sex_map <- c('1'='♂',
             '2'='♀')
agecat_min_map <- c('1'=0,
                    '2'=5,
                    '3'=10,
                    '4'=15,
                    '5'=20,
                    '6'=25,
                    '7'=30,
                    '8'=35,
                    '9'=40,
                    '10'=45,
                    '11'=50,
                    '12'=55,
                    '13'=60,
                    '14'=65,
                    '15'=70,
                    '16'=75,
                    '17'=80,
                    '18'=85,
                    '19'=90,
                    '20'=95,
                    '21'=100)

agecat_mean_map <- get_mean_map(agecat_min_map, 120)
dose_min_map <- c('1'=0.000,
                  '2'=0.005,
                  '3'=0.020,
                  '4'=0.040,
                  '5'=0.060,
                  '6'=0.080,
                  '7'=0.100,
                  '8'=0.125,
                  '9'=0.150,
                  '10'=0.175,
                  '11'=0.200,
                  '12'=0.250,
                  '13'=0.300,
                  '14'=0.500,
                  '15'=0.750,
                  '16'=1.000,
                  '17'=1.250,
                  '18'=1.500,
                  '19'=1.750,
                  '20'=2.000,
                  '21'=2.500,
                  '22'=3.000)
dose_mean_map = get_mean_map(dose_min_map, 4)

ctime_min_map <- c('1'=1951,
                  '2'=1956,
                  '3'=1961,
                  '4'=1966,
                  '5'=1971,
                  '6'=1976,
                  '7'=1981,
                  '8'=1986,
                  '9'=1991,
                  '10'=1996,
                  '11'=2001)

threshold = 1.5001

# Spruce up the data
data <- data %>% 
  # Translate
  # Values into different units
  mutate(sex = sex_map[sex],
         agex = agecat_min_map[agexcat],
         age  = agecat_min_map[agecat],
         dose = dose_mean_map[dosecat],
         ctime= ctime_min_map[ctime])

# Save a copy for future use
write.csv(data, 'data/lss14/lss14_spruced_up.csv')

# Shorten the data
data <- data %>% 
  # Thin
  # Only select the colums of interest
  select(city, 
         sex, 
         agex, 
         age,
         dose, 
         subjects, 
         death,
         ctime) %>%
  # Shorten
  # Remove those with a dose above threshold
  filter(dose < threshold)

# One row per death
dead <- ldply(unique(data$death), function(n) {
  d <- data[data$death == n,]
  d[rep(1:nrow(d), n),]
})

# Prove it was actually done
sum(data$death) == 49879
## [1] TRUE
nrow(dead) == 49879
## [1] TRUE
# Clean dead
# Make it truly represent one individual per row by removing
# aggregated values
dead <- dead %>%
  select(-subjects, -death) %>%
  mutate('alive' = 0)

# One row for each person still alive
# Note, defining the minimum age for people still alive is
# a little challenging because the relationship between age
# and calendar time is not one to one.  Concretely someone
# exposed at age 0-5 might be 5, 10, or 15 years old in 1950
table0(dead$age[dead$agex == 0 & 
                dead$ctime == 1951])
## 
##  5 10 15 
## 27 15  1
# this makes it difficult to know their projected age at some
# later date, like 2004.
#
# To overcome this challenge we will calculate the distribution
# of min calendar time - min age for a given age at exposure and
# use this distribution of difference to project a distribution
# of min ages for a new calendar time (2004 and later, for which
# we have no more data.
living <- ddply(data, .(agex, dose, city, sex), function(df){
  subjects = sum(df$subjects)
  dead = sum(df$death)
  living = subjects - dead
  .all <- function(x) unique(df[x])
  
  if(living == 0){
    return(NULL)
  }
  
  # There should be one row for each living person
  # and age should be a distribution.
  # see explanation above
  same_agex = data[data$agex == df$agex[1],]
  age = sample(with(same_agex, 2004 + age - ctime), 
               size=living, 
               replace=TRUE)
  
  data.frame(
    city = .all('city'),
    sex = .all('sex'),
    agex = .all('agex'),
    age = age,
    dose = .all('dose'),
    ctime = 2004,
    alive = 1)
})

# Merge living and dead
long <- rbind(dead, living)

# Prove that long is the correct size
nrow(long) == 85498
## [1] TRUE
sum(data$subjects) == 85498
## [1] TRUE
# Prove that ages look reasonable
ggplot(long, aes(ctime, age, color=alive)) + 
  geom_point(alpha=0.002) + 
  facet_wrap(~agex)

plot of chunk unnamed-chunk-17

# Reduce resolution 
# (as it is there are too many categories for graphing)
low_res <- long %>% 
  mutate(agex = floor(agex/20)*20,
         age_string = paste0(agex, '+ years'),
         lifespan = age,
         dose = round(dose * 2)/2)
g <- low_res
# Show it off
ggplot(g %>% filter(!alive), 
       aes(x=lifespan, 
           color=dose,
           group=factor(paste0(dose)),
           y=..scaled..)) +
  geom_density(adjust=2) +
  scale_colour_gradient(
    guide = guide_legend(title = "Mean Dose (Sv)"),
    trans = "sqrt",
    breaks= c(0,0.5,1.0,1.5),
    limits= c(0,1.5)
  ) + 
  geom_vline(
    aes(xintercept=agex),
    alpha=0.5
  ) +
  facet_wrap( ~ age_string + sex) +
  xlim(0, 100)

plot of chunk unnamed-chunk-18

Figure: LSS lifespan by dose, sex, and age at exposure

Analogous the the lifespan density plots shown for animals except that this details survivors of the atomic bomb exposures in Hiroshima and Nagasaki. Lifespan of survivors who died is shown on the x axis in years, the height of each line shows the relative frequency of this lifespan compared to others. All density curves are scaled so that their maximum value is 1. Density curves are groups by age at exposure and gender (shown in the facet labels) and by minimum total dose (Sv) designed by line color. Age of exposure is also designated by a vertical line.

Figure: LSS survival by dose, sex, and age at exposure

As before.

Observations
  • Not as extreme as ORNL This view shows more strongly that the effects seen in the ORNL data used in DDREF analysis were stronger than those seen in the atomic bomb survivors.
g <- low_res %>%
  ungroup() %>%
  group_by(agex, dose, sex) %>%
  arrange(age) %>%
  mutate(group_survival=rank(-age * (alive + 0.0001)) / length(age))

# How many people were alive after X years in each cluster?
g <- g %>%
  ungroup() %>%
  group_by(agex, sex) %>%
  arrange(age) %>%
  mutate(cluster_survival=rank(-age * (alive + 0.0001)) / length(age))

ggplot(g %>% filter(!alive), 
       aes(age, 
           group_survival - cluster_survival,
           color=dose,
           group=factor(dose))) + 
  geom_path() +
  facet_wrap(~ age_string + sex) + 
  scale_colour_gradient(
    guide = guide_legend(title = "Mean Dose (Sv)"),
    trans = "sqrt",
    breaks= c(0,0.5,1.0,1.5),
    limits= c(0,1.5)
  ) + 
  geom_vline(
    aes(xintercept=agex),
    alpha=0.5
  ) + 
  ylim(-0.4, 0.4)

plot of chunk unnamed-chunk-20

Figure: Relative LSS survival

As with the animal data we the difference between survival of an entire cluster (defined by age of exposure and sex) and a group exposed to a particular dose within that group.

LSS survival by sex

As before except that all age at first exposures are combined into a single graph. The total survival is calcualated by the number of people alive at a given age divided by the number of people who were exposed at that same age or younger.

Relative LSS survival by sex

As before, but now we show the difference between survival at a given dose and the overall survival of the cohort.

Reproduce BEIR 10B2

Last update: May 2014

Reproduce Figure 10B2 estimated DDREF using cancer incidence data.

Their approach

BEIR VII used total lifetime cancer indicence adjusted for competing risks for a variety of tumor endpoints. They only included acute exposures and purposefully excluded various cancer endpoints.

“Tables 1 and 2 because those risk estimates were not adjusted for competing causes of death; (2) results for doses greater than 2 Gy; and (3) results on lymphomas, ovarian cancer, reticulum cell carcinoma, and nonmyeloid leukemias, because these are thought to arise via atypical biological mechanisms, as discussed in Chapter 3, or to reflect an ill-defined combination of cancer types. The risks presented here are based on acute exposures only.” - BEIR VII pg 254

Models were fit as in 10B3 by fitting linear quadratic models to mean incidence rates per group with a fixed curvature, o. Concretely:

mean incidence rate ~ a * dose + o * a * dose^2

TODO: determine how this adjustment for competing risks was performed?

The data source

This analysis was based on data presented in [Table 6 of Edwards 1992][Edwards 1992]. That source could not be found, however most of the data from that source was also present in [Tables 3.1-3.8 of Cox 1995][Cox 1995]. This latter paper was used as the data source and the remaining experiments ??? were estimated from Figure 10B2 directly.

Note, Edwards This source was not found, however most of the same information

Edwards, A.A. 1992. Low Dose and Low Dose Rate Effects in Laboratory Animals, Technical Memorandum 1(92). Chilton, UK: National Radiological Protection Board.

# TODO: 10B2 needs to also use meta regression as in 10B4 to see how that influences the effect

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data <- read.csv('data/edwards_1992.csv', sep='\t') 


# apply selection criteria
data <- data %>% 
  filter(protraction == 'Acute',
         !is.na(risk),
         dose <= 2.0)

# Fit model to each facet
data <- ddply(data, .(facet), function(df){
  model <- lm(risk ~ dose + I(dose^2), df, weights=(1/error^2))
  df$seperate_prediction <- predict(model)
  df
})


# Fit models to all data
# vary curvature and measure total likelihood of each model
low = -2
high = 6
delta = .01
o_range = (low/delta):(high/delta) * delta
fits <- ldply(o_range, function(o){
  model <- lm(risk ~ factor(facet) * I(dose + o*dose^2),
              data,
              weights=(1/error^2))
  c(o=o, 
    likelihood=logLik(model))
})

fits <- fits %>%
  mutate(likelihood=normalize_likelihood(likelihood, delta))

ggplot(fits, aes(o, likelihood)) + geom_path()

plot of chunk unnamed-chunk-24

# Make predictions for most likely model
o = with(fits, o[which.max(likelihood)])
model <- lm(risk ~ factor(facet) * I(dose + o*dose^2),
              data,
              weights=(1/error^2))
data$single_prediction <- predict(model)


# Plot a reproduction of 10B2
ggplot(data, 
       aes(x=dose, 
           y=risk,
           ymin=risk - error*2,
           ymax=risk + error*2, 32)) + 
  geom_point(aes(shape=source)) + 
  geom_pointrange(aes(ymin=risk - error*2, ymax=risk)) + 
  geom_pointrange(aes(ymin=risk, ymax=risk + error*2)) +
  geom_smooth(aes(x=dose, y=seperate_prediction),
              method="lm",
              formula = "y ~ x + I(x^2)",
              fullrange=TRUE,
              color='black') +
  geom_smooth(aes(x=dose, y=single_prediction),
              method="lm",
              formula = "y ~ x + I(x^2)",
              fullrange=TRUE,
              color='black',
              linetype='dotted') +
  facet_wrap(~ facet + cancer + sex + strain + species, 
             scales='free_y',
             ncol=3) +
  xlim(0, 2) +
  xlab("Dose (Gy)") +
  ylab("Risk %")

plot of chunk unnamed-chunk-24

# save fits for use in 10B4
write.csv(fits, 'data/10B2_fits.csv')

Figure: 10B2 reproduction

A direct reproduction of figure 10B2 from the BEIR VII report.

10B2

This is the original 10B2 figure from the beir VII report [beir-10b2].

10B2-image

Observations
  • We can reproduce BEIR VII This graph provides visual evidence that we can accurately reproduce best fit lines used for DDREF estimation from cancer data. These likelihoods will be combined with DDREF estimates from atomic bomb survivors and animal lifespan studies to provide a final estimate of DDREF. Only the lifespan estimates are to be updated by this work, but these other estimates must be faithfully reproduced so that they can compbine with the modified lifespan estimates to provide a final modified estimate of DDREF.

Reproduce BEIR 10B3

Last update: June 2013

Reproduce the BEIR estimates on the oak ridge lifespan data from storer 1979 (3575012.pdf).

note: Apparently BEIR VII grouped together acute exposures from tables 1 and 2 if they had the same total dose (without grouping, with grouping).

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data <- read.csv('data/storer_1979.csv', sep='\t')  
    

model_10B3 <- function(data){
    glm(
        I(1/age - 1/636.175) ~ dose + I(dose^2 / (fractions)) - 1,
        data=data,
        weights=1/sd^2
    )
}   
show <- function(g){
    original <- g[is.na(g$p_10B3),] 
    predictions <- g[!is.na(g$p_10B3),]
    suppressWarnings(print(
    ggplot(original, aes(
        dose, 
        1/age, 
        label=type,
        group=type
    )) +
        geom_text(size=5) + 
        geom_smooth(
            data=predictions,
            aes(dose, p_10B3),
            color='black'
        ) +
       scale_y_continuous(breaks = 0.0002*0:4+0.0016)
    ))
}

    
# Constants
threshold = 1.5001

# Prediction Intervals
to_predict <- expand.grid(
    fractions = c(1, Inf),
    dose = seq(0, 1.5, 0.1)
)
to_predict$type <- 'A'
to_predict$type[to_predict$fractions > 1] <- 'C'

# Clean
data$fractions <- 1
data$fractions[data$rate < 0.1] <- Inf

# Define Acute
chronic <- data$rate < 0.1
data$type <- 'A'
data$type[chronic] <- 'C'

# Combine acute exposures with the same treatment
# (see note above for a justification)
# TODO(ben) do this step in every analysis that uses storer
data <- ddply(data, .(dose, type, sex, fractions, strain), function(df) {
  all <- function(x) paste(unique(x), sep=', ', collapse=', ')
  with(df,data.frame(
            table=all(table),
            n=sum(n),
            dose=dose[1],
            rate=mean(rate),
            strain=all(strain),
            sex=sex[1],
            age= sum(age * n) / sum(n),
            sd= (sum(sd^2 * n)/sum(n))^0.5,
            fractions=fractions[1],
            type=type[1]
  ))
})

# Subset
data$modeled_in_10B3 <- with(data, 
    dose < threshold &
    strain == 'RFM'  &
    sex == 'F'
)
data$in_my_analysis <- data$type == 'A'

# Save for later use
write.csv(data, 'data/storer_1979_processed.csv')

# prediction matrix
predictions <- to_predict
addin <- names(data)[!names(data) %in% names(predictions)]
predictions <- merge(data[data$modeled_in_10B3,][1, addin], predictions, all=TRUE)

# 10B3 Model
m <- model_10B3(data[data$modeled_in_10B3,])
predictions$p_10B3 <- predict(m, newdata=predictions) + 1/636.175

# My Model
s <- data$modeled_in_10B3 & 
     data$in_my_analysis
m <- model_10B3(data[s,])
predictions$p_my_analysis <- predict(m, newdata=predictions) + 1/636.175

# Merge
data$p_10B3 <- NA
data$p_my_analysis <- NA

data <- rbind(predictions, data)








# TODO(ben) delete this when you finally give up
#
# # data2
# data2 <- read.csv('data/storer_1979.csv', sep='\t')  
# data2 <- rbind(data2, data.frame(table=3, 
#                         n=312, 
#                         dose=0,
#                         rate=0.45,
#                         strain='RFM',
#                         sex='F',
#                         age=644.0,
#                         sd=8.21))
#  
# model_10B3 <- function(data){
#     intercept <- data[data$dose == 0, 'age']
#     glm(
#         I(1/age - 1/intercept) ~ dose + I(dose^2 / (fractions)) - 1,
#         data=data,
#         weights=1/(n*sd)^2
#     )
# }   
# 
# # Constants
# threshold = 1.5001
# 
# # Prediction Intervals
# to_predict <- expand.grid(
#     fractions = c(1, Inf),
#     dose = seq(0, 1.5, 0.1)
# )
# to_predict$type <- 'A'
# to_predict$type[to_predict$fractions > 1] <- 'C'
# 
# # Clean
# data2$fractions <- 1
# data2$fractions[data2$rate < 0.1] <- Inf
# 
# # Define Acute
# chronic <- data2$rate < 0.1
# data2$type <- 'A'
# data2$type[chronic] <- 'C'
# 
# # Combine acute exposures with the same treatment
# # (see note above for a justification)
# # TODO(ben) do this step in every analysis that uses storer
# data2 <- ddply(data2, .(dose, type, sex, fractions, strain), function(df) {
#   all <- function(x) paste(unique(x), sep=', ', collapse=', ')
#   with(df,data.frame(
#             table=all(table),
#             n=sum(n),
#             dose=dose[1],
#             rate=mean(rate),
#             strain=all(strain),
#             sex=sex[1],
#             age= sum(age * n) / sum(n),
#             sd= (sum(sd^2 * n)/sum(n))^0.5,
#             fractions=fractions[1],
#             type=type[1]
#   ))
# })
# 
# # Subset
# data2$modeled_in_10B3 <- with(data2, 
#     dose < threshold &
#     strain == 'RFM'  &
#     sex == 'F'
# )
# data2$in_my_analysis <- data2$type == 'A'
# 
# # prediction matrix
# predictions <- to_predict
# addin <- names(data2)[!names(data2) %in% names(predictions)]
# predictions <- merge(data2[data2$modeled_in_10B3,][1, addin], predictions, all=TRUE)
# 
# # 10B3 Model
# m <- model_10B3(data2[data2$modeled_in_10B3,])
# predictions$p_10B3 <- predict(m, newdata=predictions) + 1/data2[2,'age']
# 
# # My Model
# s <- data2$modeled_in_10B3 & 
#      data2$in_my_analysis
# m <- model_10B3(data2[s,])
# predictions$p_my_analysis <- predict(m, newdata=predictions) + 1/data2[2,'age']
# 
# # Merge
# data2$p_10B3 <- NA
# data2$p_my_analysis <- NA
# 
# data2 <- rbind(predictions, data2)
# 
# graph(data, data2)








graph <- function(data, data2){

  g <- data[with(data, 
      strain == 'RFM' & 
      sex == 'F' &
      rate != 0.4
  ),]
  original <- g[is.na(g$p_10B3),] 
  predictions <- g[!is.na(g$p_10B3),]
  
  g2 <- data2[with(data2, 
      strain == 'RFM' & 
      sex == 'F' &
      rate != 0.4
  ),]
  original2 <- g2[is.na(g2$p_10B3),] 
  predictions2 <- g2[!is.na(g2$p_10B3),]
  
  ggplot(NULL) +
    geom_text(data=original, 
              size=5,
              aes(dose, 
                  1/age, 
                  label=type,
                  group=type)) + 
    geom_smooth(data=predictions,
                aes(dose, p_10B3, group=type),
                color='black',
                se=F) +
    geom_text(data=original2, 
              size=5,
              aes(dose, 
                  1/age, 
                  label=type,
                  group=type,
                  color='red')) + 
    geom_smooth(data=predictions2,
                aes(dose, p_10B3, group=type),
                color='red',
                se=F) +
    scale_y_continuous(breaks = 0.0002*0:4+0.0016)
}

Show

10B3

This is the original 10B3 figure from the beir VII report [beir-10b3].

10B3-image

Reproduce 10B3

This is my reproduction of 10B3 to prove that I am faithfully applying their methodology.

g <- data[with(data, 
    strain == 'RFM' & 
    sex == 'F' &
    rate != 0.4
),]
show(g)

plot of chunk unnamed-chunk-26

ggsave_for_ppt('beir_10B3_reproduction.png')

Results

I am capable of reproducing their results very well. There are a few tricks that were not explicitly stated in BEIR VII.

  1. They combined treatement groups from tables 1 and 2 if they were acute and had the same total dose
  2. They only used female RFM mice.
  3. The regression was weighted by 1/sd^2 (or n?)
  4. The regression intercept was set to the control condition

Strangely, they do not include data from table 2, but when I add this in, it does not make a huge difference, so I assume they are just not being careful.

^ back to table of contents


Reproduce BEIR 10B4

Last update: June 2014 Reproduce the BEIR estimates on the oak ridge lifespan data from storer 1979 (3575012.pdf) and combine these with the oak ridge tumor estimates developed in the [10B2 reproduction][#10B2] to reproduce 10B4.

Verify that confidence intervals are the same as those seen in table 10-2.

# TODO(ben) make sure the reproduction is perfect, instead of almost perfect   

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data <- read.csv('data/storer_1979_processed.csv') 
data_10B2 <- read.csv('data/10B2_fits.csv')

# Model
low = -2
high = 6
delta = 0.01
o_range = (low/delta):(high/delta) * delta

get_likelihoods <- function(o_range, data){
    r <- ldply(o_range, function(o){
        m <- model_10B4(data, o)
        l = logLik(m)
        
        data.frame(o, l)
    })
    
    r$l <- normalize_likelihood(r$l, delta)
    
    r
}

model_10B4 <- function(data, o){    
    glm(
        I(1/age) ~ I(dose + o*dose^2 / (fractions)),
        data=data,
        weights=n
    )
}

beir_r <- get_likelihoods(o_range, data[data$modeled_in_10B3,])

# Merge likelihoods with data from 10B2
alignment <- match( round(beir_r$o, 5),
                    round(data_10B2$o, 5))

beir_r <- beir_r %>%
  mutate(tumor = data_10B2$likelihood[alignment],
         lifespan = l,
         mean = (tumor + lifespan)/2) %>%
  select(-l)

# save for later
write.csv(beir_r, file='data/10B4.csv', row.names=FALSE)

Show

10B4

This is the original 10B4 figure from the beir VII report [beir-10b4].

10B4-image

Reproduce 10B4

This is my reproduction of 10B4 to prove that I am faithfully applying their methodology.

g <- melt(beir_r, 
          id.vars='o', 
          value.name='likelihood', 
          variable.name='source')
ggplot(g, aes(o, likelihood, linetype=source)) + 
  geom_path() +
  scale_linetype_manual(values=c('dotted', 'dotdash', 'solid')) +
  scale_y_continuous(breaks = c(0:5)/5, limits=c(0,1.2))

plot of chunk unnamed-chunk-28

Table 10-2

This is the original 10-2 table from the beir VII report [beir-10b4].

table-10-2-image

Reproduce Table 10-2 row 1

Reproduce confidence intervals from my re-production, to prove that I am faithfully determining confidence intervals and to butress visual proof with numeric evidence.

Note, my confidence interfal should match the first row and last column, the 95% confidence interval from radiobiology data.

# is        "0.4 (0.1, 3.2)"
# should be "0.5 (0.1, 3.2)"  
confidence_interval(beir_r$o, beir_r$mean, 0.05)
## [1] "0.44 (0.06, 3.22)"
## [1] 0.06 0.44 3.22

Results

I was able to reproduce figure 10B4 and the confidence intervals nearly perfectly. The slight disparity that is observed could be due to outstanding methodological issues or small differences in the data. For example, perhaps the original researchers applied rounding that I did not, or perhaps my estiamtes of the points on their graphs were different.

In any case, the estimate has changed a trivial amount from the original. Making it perfect is a job for later.

^ back to table of contents


Supplemental: Reproduce lifespan analysis using only acute data

Last update: June 2014 Individual level data from oak ridge is only available from acute exposures. How does the profile likelihood change if only this data is used?

This is a simple adaptation of the techniques used to [reproduce 10B4][#10B4].

# TODO(ben) make sure the reproduction is perfect, instead of almost perfect   

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data <- read.csv('data/storer_1979_processed.csv') 
data_10B2 <- read.csv('data/10B2_fits.csv')

# Model
low = -2
high = 6
delta = 0.1
o_range = (low/delta):(high/delta) * delta

get_likelihoods <- function(o_range, data){
    r <- ldply(o_range, function(o){
        m <- model_10B4(data, o)
        l = logLik(m)
        
        data.frame(o, l)
    })
    
    r$l <- normalize_likelihood(r$l, delta)
    
    r
}

model_10B4 <- function(data, o){    
    glm(
        I(1/age) ~ I(dose + o*dose^2 / (fractions)),
        data=data,
        weights=n
    )
}

beir_r <- get_likelihoods(o_range, data %>% filter(in_my_analysis))

# Merge likelihoods with data from 10B2
alignment <- match( round(beir_r$o, 5),
                    round(data_10B2$o, 5))
beir_r <- beir_r %>%
  mutate(tumor = data_10B2$likelihood[alignment],
         lifespan = l,
         mean = (tumor + lifespan)/2) %>%
  select(-l)

Show

10B4 with acute lifespan data

Here is 10B4 if only acute data is used.

g <- melt(beir_r, 
          id.vars='o', 
          value.name='likelihood', 
          variable.name='source')
ggplot(g, aes(o, likelihood, linetype=source, color=source)) + 
  geom_path() +
  scale_linetype_manual(values=c('dotted', 'dotdash', 'solid')) +
  scale_color_manual(values=c('black', 'red', 'black')) +
  scale_y_continuous(breaks = c(0:5)/5, limits=c(0,1.2))

plot of chunk unnamed-chunk-31

Confidence intervals with acute only

Here’s how the confidence intervals of the mean come out when only acute data is used.

# was       "1.4 (1.1, 4.2)"
# is        "1.4 (1, 3.2)"  
confidence_interval(beir_r$o + 1, beir_r$mean, 0.05)
## [1] "1.4 (1, 3.2)"
## [1] 1.0 1.4 3.2

Results

If we only include acute results we see that the profile likelihood changes dramatically, giving a lower but flatter estimate. The mean confidence interval changes mildly the 50% point goes down marginally and the upper bound goes down substantially.

^ back to table of contents


Reproduce BEIR 10-2 (fail)

Last update: May 2014 Reproduce the BEIR estimates of dose reponse for atomic bomb survivors.

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data <- read.csv('data/lss14/lss14_spruced_up.csv')

# Constants
threshold = 2.0

# Filter the data
data <- data %>%
  select(city, sex, gd3, ahs, ctime, solid,
         subjects, agex, age, dose, pyr, death) %>%
  filter(dose <= threshold)

# Model
model <- function(new_terms, threshold=Inf) {
  formula <- solid ~ city +
                     sex*I(log(age + 1)) +
                     sex*I(log(agex + 1)) +
                     offset(log(pyr)) -
                     1
  
  formula <- update(formula, new_terms)
  
  glm(formula,
      family='poisson',
      data = data %>% filter(dose <= threshold))
}

factor_fit <- model(~ . + factor(dose))
linear_fit <- model(~ . + dose, threshold=1.5)
curvature_3_fit <- model(~ . + I(dose + 0.3*dose^2), threshold=1.5)
curvature_7_fit <- model(~ . + I(dose + 0.7*dose^2), threshold=1.5)

# Data points to predict
to_predict <- expand.grid(
  city = unique(data$city),
  sex = unique(data$sex),
  agex = 30,
  age = 60,
  pyr = 1,
  dose = unique(data$dose))

# Make predictions
err <- function(model, data) {
  baseline <- data
  baseline$dose <- min(data$dose)

  risk <- exp(predict(model, newdata=data))
  control_risk <- exp(predict(model, newdata=baseline))
  err <- risk / control_risk - 1
  
  err
}
to_predict$err <- err(factor_fit, to_predict)
to_predict$linear_err <- err(linear_fit, to_predict)
to_predict$curvature_3_err <- err(curvature_3_fit, to_predict)
to_predict$curvature_7_err <- err(curvature_7_fit, to_predict)

# Average by Dose and city
aggregate <- to_predict %>%
  group_by(dose) %>%
  summarize(err=mean(err),
            linear_err=mean(linear_err),
            curvature_3_err=mean(curvature_3_err),
            curvature_7_err=mean(curvature_7_err))

Show

10-2

This is the original 10-2 figure from the beir VII report [beir-10-4].

10-2-image:

Reproduce 10-2

This is my attempt to reproduce 10-2 to prove that I am faithfully applying their methodology.

# Show data
ggplot(aggregate, aes(dose, err)) + 
  geom_point() +
  geom_smooth(aes(y=linear_err), 
              method='lm', 
              formula='y ~ x',
              se=FALSE) +
  geom_smooth(aes(y=curvature_3_err), 
              method='lm', 
              formula='y ~ I(x^2)',
              se=FALSE) +
  geom_smooth(aes(y=curvature_7_err), 
              method='lm', 
              formula='y ~ I(x^2)',
              se=FALSE)

plot of chunk unnamed-chunk-34

# TODO: If I look at the linear fits here they are not actually linear, because of the poisson x-form of course.  But they are linear at 10-3.  That's strange, it makes me think the fit a linear quadratic model after the fact.

Results

I was unable to reproduce figure 10-2 using original lss data. As you can see niether the points nor the fits reproduce correspond exactly.

There are several causes for these discrepiancies:

  1. BEIR VII does not specificy exactly what data was used for this figure. From the following quote it might be inferred that they used some work-in-progress dataset that is not publically available. > “Because the most recent cancer incidence data were not yet available outside of RERF, analyses of these data were conducted under the direction of the committee by RERF investigators who served as agents of the Academy.”
    > ANNEX 12B - pg 296
    > http://www.nap.edu/openbook.php?record_id=11340&page=296

  2. BEIR VII does not articulate how data was clustered they used some grouping of doses that is different than the one avaiable from the lss downloads. This would probably be a problem that we could solve by hand if we knew what data was being used.

  3. BEIR VII does not specify exactly what models were used for this fit. They reference a variety of models in the annexes of chapter 12 and we could probably determine which was used here if we could solve the first two problems.

What to do?

This reproduction is necessary to form a final estimate of DDREF using the same methodology as BEIR VII and therefore to know how much updates to the animal data affected the final result. It would be re-assuring to reproduce this data from source values, but fine if we can simply get the results as a profile likelihood. Therefore I will:

  1. Contact someone who helped to produce these results and ask them to help me.
  2. Try to reproduce figure 10-3 (the profile likelihood curves that will actually be used) using data reproduced from these graphs rather than original profile likelihood data.

^ back to table of contents


Reproduce BEIR 10-3 lss curve (from a reproduction of 10-2)

Last update: May 2014 Reproduce the BEIR VII estimates of DDREF likelihood using atomic bomb survivor data.

*Note this data is based on a reproduction of figure 10-2 instead of on original atomic bomb survivor data for the reasons listed in my attempt to reproduce 10-2

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data <- read.csv('data/10-2.csv')  # a reproduction of BEIR 10-2

# Show 10-2
# Prove that we can do the linear and quadratic fits for 10-2
fit <- function(...) geom_smooth(
  ...,
  method='lm', 
  se=FALSE,
  color='black',
  fullrange=TRUE)

ggplot(data %>% filter(dose < 1.5), aes(dose, err)) +
  fit(formula='y ~ x - 1') +
  fit(formula='y ~ I(x + 0.3 * x^2) - 1', 
               linetype='dotted') +
  fit(formula='y ~ I(x + 0.7 * x^2) - 1',
               linetype='dotdash') + 
  scale_y_continuous(breaks = c(0:3)/2) +
  scale_x_continuous(breaks = c(0:4)/2) +
  geom_pointrange(data=data, 
                  aes(ymax=err + 2*sd, ymin=err - 2*sd)) +
  coord_cartesian(xlim=c(-0.1,2.1), ylim=c(-0.1,1.6))

plot of chunk unnamed-chunk-35

Reproduction of Figure 10-2

We managed a nearly perfect reproduction of 10-2 from the reconstituted data which shows that our formulas are correct.

Methodological error It is strange that the authors chose to determine DDREF on linear quadratic fits to excess relative risk, rather than fitting seperate models on the poisson regressions that were used to excess relative risk in the first place. I will try to correct that oddity in my subsequent work.

Trivial statistical error Erroneously, these fits are from unweighted regression where each point counts equally. They should have been performed such that data points with more people in them (lower doses) recieve a higher weight. Fortunately this error is not repeated for the profile likelihood analysis and so it does not affect the overall analysis, only this graph.

# Model
low = -2
high = 6
delta = 0.1  # decrease for higher resolution
o_range = (low/delta):(high/delta) * delta

# Data
data_to_model <- data %>% 
  # < 1.5 Sv of exposure
  filter(dose < 1.5) %>%
  # Prevent sd from ever equalling 0 (to allow 1/sd to work)
  mutate(sd = sd + 0.00001)

# Model
likelihoods <- ldply(o_range, function(o){
  model <- glm(err ~ I(dose + o*dose^2) - 1,
               data=data_to_model,
               weights=1/sd^2)
  c(o=o, l=logLik(model))
})

# Summarize Effect
likelihoods$l <- normalize_likelihood(likelihoods$l, delta)

# save for later
setwd('~/janus')
write.csv(likelihoods, file='data/10-3-lss.csv', row.names=FALSE)
# Show
ggplot(likelihoods, aes(o, l)) + 
  geom_path(linetype='dotdash') + 
  scale_y_continuous(breaks = c(0, 0.5, 1.0, 1.5), limits=c(0,1.5))

plot of chunk unnamed-chunk-37

BEIR 10-3 reproduction of lss analysis

We are able to form a ‘reasonable’ reproduction of the atomic bomb survivor curve from 10-3. It’s peak does not rise quite as high as the original firgure. This may be because of errors when re-constructing the data used in this figure, or because of unknown differences in methodology. In any case, the difference is small and difficult to resolve because the original data and full methodology is un-available.

10-3-image:

BEIR 10-3

This is the original 10-3 figure from the beir VII report [beir-10-3].

is “0.3 (-0.1, 1.6)”

should be “0.3 (-0.1, 1.5)”

confidence_interval(likelihoods\(o, likelihoods\)l, 0.05) ```

Table 10-2

This is the original 10-2 table from the beir VII report [beir-10b4].

table-10-2-image

Reproduce Table 10-2 row 2

Reproduce confidence intervals from my lss profile likelihood curve reproductions, to provide quantitiave evidence that I am faithfully applying the same methodology.

# is        "0.3 ( 0.0, 1.6)"
# should be "0.3 (-0.1, 1.5)"  
confidence_interval(likelihoods$o, likelihoods$l, 0.05)
## [1] "0.3 (-0.1, 1.6)"
## [1] -0.1  0.3  1.6

These values are very close to the original. Good reason to move forward.

^ back to table of contents


Reproduce BEIR 10-3 from reproductions of BEIR 10B4 and 10-2

Last update: June 2014 Reproduce the BEIR VII estimates of DDREF likelihood that combines data from atomic bomb survivors and animal data.

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data_lss <- read.csv('data/10-3-lss.csv')
data_animal <- read.csv('data/10B4.csv')


# Merge likelihoods from 10-3 and 10B4
alignment <- match(round(data_lss$o, 5),
                   round(data_animal$o, 5))

# Determine profile resolutions
# (this is needed to normalize the profile likelihood curves for
#  graphing)
delta = data_lss$o[1] - data_lss$o[2]

data <- data_lss %>%
  mutate(l2 = data_animal$mean[alignment],
         'Radiobiological Prior' = l2,
         'LSS likelihood' = l,
         'Combined posterior' = -normalize_likelihood(
           log(l) + log(l2), 
           delta)
         ) %>%
  select(-l, -l2)
g <- melt(data, 
          id.vars='o', 
          value.name='likelihood', 
          variable.name='source')
ggplot(g, aes(o, likelihood, linetype=source)) + 
  geom_path() +
  scale_linetype_manual(values=c('dotted', 'dotdash', 'solid')) +
  scale_y_continuous(breaks = c(0, 0.5, 1, 1.5), limits=c(0,1.5))

plot of chunk unnamed-chunk-40

BEIR 10-3 reproduction

Pretty good reproduction. The lss curve is slightly lower than would be expected, this issue is discussed in the section where that reproduction was made.

10-3-image:

BEIR 10-3

This is the original 10-3 figure from the beir VII report [beir-10-3].

Table 10-2

This is the original 10-2 table from the beir VII report [beir-10b4].

table-10-2-image

Reproduce Table 10-2

Reproduce confidence intervals from my profile likelihood curve reproductions, to provide quantitiave evidence that I am faithfully applying (nearly) the same methodology.

# is        "0.3 ( 0.0, 1.6)"
# should be "0.3 (-0.1, 1.5)"  
o = data$o

# should be "0.5 ( 0.1, 3.2)"
# is        "0.4 ( 0.1, 3.2)"
confidence_interval(o, data[['Radiobiological Prior']], 0.05)
## [1] "0.4 (0.1, 3.2)"
## [1] 0.1 0.4 3.2
# should be "0.3 (-0.1, 1.5)"
# is        "0.3 (-0.1, 1.6)"
confidence_interval(o, data[['LSS likelihood']], 0.05)
## [1] "0.3 (-0.1, 1.6)"
## [1] -0.1  0.3  1.6
# should be "0.5 ( 0.1, 1.2)"
# is        "0.4 ( 0.1, 1.2)"
confidence_interval(o, data[['Combined posterior']], 0.05)
## [1] "0.4 (0.1, 1.2)"
## [1] 0.1 0.4 1.2

Nearly identical, only some very small (trivial) differences. Time to move forward.

^ back to table of contents


1/lifespan on all data

Last update: June 2013

Show graphs like Storer 1979 10-B3 but for all data.

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data <- readRDS('data/ddref.rds')  

# Helpers
model_10B3 <- function(data){
    glm(
        I(1/age) ~ dose + I(dose^2 / (fractions)),
        data=data,
        weights=n
    )
}
show <- function(g){
    f <- function(x) round(x, 2)
    suppressWarnings(print(
    ggplot(g[is.na(g$p),], aes(
            dose, 
            1/age, 
            label=type,
            group=type
        )) + 
        geom_text(size=4) + 
        geom_smooth(
            data=g[!is.na(g$p),],
            aes(dose, p, color=type), 
            method='lm', 
            formula='y ~ x + I(x^2)',
            se=FALSE
        ) +
        facet_wrap(~ cluster, scales='free_y')
    ))
}

# Prediction Intervals
to_predict <- expand.grid(
    fractions = c(1, 1000),
    dose = seq(0, 1.5, 0.1)
)
to_predict$type <- 'A'
to_predict$type[to_predict$fractions > 1] <- 'C'


# Mean Lifespans
aggregate <- ddply(data, .(cluster, group_id, sex), function(df){
    u <- function(x) paste(unique(x), collapse=' ')
    dont_aggregate <- c('lifespan', 'id', 'X', 'n')
    data.frame(
        llply(df[,!names(df) %in% dont_aggregate], u),
        age=mean(df$lifespan),
        age_sd=sd(df$lifespan),
        n=nrow(df)
    )   
})

# Restore Sanity
numerics <- names(data)[laply(data, is.numeric)]
for(n in numerics) {
    if(n %in% names(aggregate)){
        aggregate[,n] <- as.numeric(as.character(aggregate[,n]))
    }
}

# Model
m <- c('cluster', 'sex', 'cluster')
df <- aggregate[
    aggregate$cluster == aggregate$cluster[1] &
    aggregate$sex == aggregate$sex[1] &
    aggregate$cluster == aggregate$cluster[1]
,]
aggregate <- ddply(
    aggregate, 
    .(cluster, sex, cluster), 
    function(df){
        # Extract Coefficients
        m <- model_10B3(df)
        c <- m$coefficients
        df$a <- c['dose']
        df$B <- c['I(dose^2/(fractions))']
        
        # Extract Predictions
        predictions <- to_predict
        addin <- names(df)[!names(df) %in% names(predictions)]
        predictions <- merge(df[1, addin], predictions, all=TRUE)
        predictions$p <- predict(m, newdata=predictions)
        
        # merge
        df$p <- NA
        out <- rbind(df, predictions)
        
        out
    }
)

#TODO(later) change the colors in order to make the theme consistent
# throughout the presentation.  I recommend black for chronic
# effects and red for accute.  Use transparency to represent my
# new results when I over-lay them so that the old results can 
# still be seen (or visa-versa).


# Show
# As in 10B3
# http://www.nap.edu/openbook.php?record_id=11340&page=257
a <- aggregate
g <- a
show(g)

plot of chunk unnamed-chunk-42

ggsave_for_ppt('inverse_lifespan.png')

Results

This data is far from well behaved!

Chronic effects may appear better or worse than projected acute effects. Sometimes hormesis like respsonses appear. Its not obvious from this graph because the y-axis is stretched to fit the data, but the magnitude of the effect is changing wildly too.

It is no wonder that radiobiology is full of debate!

At this point we should be a bit skeptical of organizing the data in this, the BEIR VII manner. While that approach seemed reasonable given the ORNL data that they worked with, it clearly does not generalize well. This may be because the underlying statitical approach is flawed, or simply that these graphs a very robust way of displaying the effect. In any case we question the ‘intuitive appeal’ of graph 10B3. While it seemed quite difinitive in isolation, the effect is lost when we try to repeat it on new datasets.

^ back to table of contents


1/lifespan profiles on all data

Last update: June 2013

Show graphs like Storer 1979 10B4 but for all data.

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data <- readRDS('data/ddref.rds')


# Helpers
model_10B4 <- function(data, o){    
    glm(
        I(1/age) ~ I(dose + o*dose^2 / (fractions)),
        data=data,
        weights=n
    )
}       
    
# Mean Lifespans
aggregate <- ddply(data, .(cluster, group_id, sex), function(df){
    u <- function(x) paste(unique(x), collapse=' ')
    dont_aggregate <- c('lifespan', 'id', 'X', 'n')
    data.frame(
        llply(df[,!names(df) %in% dont_aggregate], u),
        age=mean(df$lifespan),
        age_sd=sd(df$lifespan),
        n=nrow(df)
    )   
})

# Restore Sanity
numerics <- names(data)[laply(data, is.numeric)]
for(n in numerics) {
    if(n %in% names(aggregate)){
        aggregate[,n] <- as.numeric(as.character(aggregate[,n]))
    }
}


# Model
low = -2
high = 6
delta = 0.01  # decrease for higher resolution
o_range = (low/delta):(high/delta) * delta
    

# Model
aggregate <- ldply(o_range, function(o){
    ddply(aggregate, .(cluster, sex), function(df){
        
        m <- model_10B4(df, o)
        df$l <- logLik(m)
        df$o <- o
        df
    })
})

# Summarize Effect
summary <- ddply(aggregate, .(o), function(df){
    sum(df$l, na.rm=TRUE)
})
names(summary) <- c('o','l')
summary$l <- normalize_likelihood(summary$l, delta)


# Normalize
aggregate <- ddply(aggregate, .(cluster, sex), function(df){
    df$l <- normalize_likelihood(df$l, delta)
    df
})

# Show
# As in 10B4
# http://www.nap.edu/openbook.php?record_id=11340&page=257
a <- aggregate

show <- function(g){
    g$cluster <- as.factor(as.character(g$cluster))
    g$l <- pmin(g$l, 1)
    suppressWarnings(print(
    ggplot(g, aes(o, l)) + 
        geom_path() +
        ylim(0,1) +
        facet_wrap(~ cluster)
    ))
}
g <- a
show(g)

plot of chunk unnamed-chunk-43

ggsave_for_ppt('inverse_lifespan_profile.png')

Results

Looks bad, we are way too confident!

^ back to table of contents


Meta Regression Figure

Last update: June 2013

A figure that shows off the principal of meta-regression.

    # Common
    setwd('~/janus')
    source('scripts/util/util.R') # http://goo.gl/VYzkAs
    
    # Fake data
    set.seed(1)
    sd <- 1
    n <- 5
    x <- 1:n
    v <- rnorm(n)^2
    data <- data.frame(
        yi = rnorm(n, 1:n, sd),
        vi = v,
        x = x
    )
    
    # An outlier
    data$yi[3] <- 0
    data$vi[3] <- .01
    
    # Model
    m <- rma(
        yi, 
        vi, 
        mods = cbind(x), 
        data = data,
        method="ML"
    )
    data$p <- predict(m)$pred
    data$tau2 <- m$tau2
    
    # Predict likelihood
    data$o2 <- (data$vi + data$tau2)
    data$e <- (data$p - data$yi)*.3
    n <- nrow(data)
    l <- with(data, sum( 
        -(1/2) * sum(e^2 / o2) +
        -(1/2) * log(o2) +
        -(1/2) * log(2*pi)
    ))
    logLik(m) 
## 'log Lik.' -8.707 (df=3)
    l  
## [1] -7.493
    # Show
    ggplot(data, aes(x, yi)) + 
        geom_point() +
        geom_errorbar(aes(
            ymin=yi - vi^0.5, 
            ymax=yi + vi^0.5,
        ), width=0.1, alpha=0.5) + 
        geom_errorbar(aes(
            ymin=yi - (vi + tau2)^0.5, 
            ymax=yi + (vi + tau2)^0.5,
        ), width=0.1, alpha=0.5, color='red') + 
        geom_path(aes(x, p), color='black')

plot of chunk unnamed-chunk-44

    ggsave_for_ppt('meta_regression_example.png')

^ back to table of contents


Meta Regression of BEIR 10B3

Last update: June 2013

BEIR fits oak ridge data as if they are points, but actually each point represents many samples and we know the standard deviation of this estimate. Therefore meta-regression is a more appropriate form of analysis. Here we show the fit by meta-regression.

We will also run a heterogeneity test.

    # Common
    setwd('~/janus')
    source('scripts/util/util.R') # http://goo.gl/VYzkAs
    
    # Data
    data <- read.csv('data/storer_1979.csv', sep='\t')  
    
    # Helpers
    model_10B3 <- function(data){
        glm(
            I(1/age) ~ dose + I(dose^2 / (fractions)),
            data=data,
            weights=n
        )
    }
    model_meta <- function(data){   
        data$a <- data$dose
        data$B <- with(data, dose^2 / (fractions))
        rma(
            yi, 
            vi, 
            mods = cbind(a, B), 
            data = data,
            method='ML'
        )
    }
    predict_meta <- function(m, newdata){
        newdata$a <- newdata$dose
        newdata$B <- with(newdata, dose^2 / (fractions))
        predict(m, newmods=with(newdata, cbind(a, B)))$pred
    }
    show <- function(g){
        original <- g[is.na(g$p_10B3),] 
        predictions <- g[!is.na(g$p_10B3),]
        
        suppressWarnings(print(
        ggplot(original, aes(
            dose, 
            1/age, 
            label=type,
            group=type
        )) + 
            geom_errorbar(aes(
                ymin=1/age + (vi + tau2)^0.5, 
                ymax=1/age - (vi + tau2)^0.5
                ), alpha=0.5, width=.05, color='red') +
            geom_errorbar(aes(
                ymin=1/age + vi^0.5, 
                ymax=1/age - vi^0.5
            ), alpha=0.5, width=.1) +
            geom_text(size=4) + 
            geom_line(
                data=predictions, 
                aes(dose, p_10B3)
            ) + 
            geom_line(
                data=predictions, 
                aes(dose, p_my_analysis), 
                color='red'
            ) +
            annotate(
                "text", 
                x=1, 
                y=0.0024, 
                label=paste(
                    "p heterogeneity < ", 
                    format(p_heterogeneity, scientific=TRUE, digits=1)
                )
            )
        ))
    }
    
    # Constants
    threshold = 1.5001

    # Prediction Intervals
    to_predict <- expand.grid(
        fractions = c(1, 1000),
        dose = seq(0, 1.5, 0.1)
    )
    to_predict$type <- 'A'
    to_predict$type[to_predict$fractions > 1] <- 'C'
    
    # Clean
    data$fractions <- 1
    data$fractions[data$rate < 0.1] <- Inf

    # Define Acute
    chronic <- data$rate < 0.1
    data$type <- 'A'
    data$type[chronic] <- 'C'
    
    # Subset
    data$modeled_in_10B3 <- with(data, 
        dose < threshold &
        strain == 'RFM'  &
        sex == 'F' &
        rate != 0.4
    )
    data$in_my_analysis <- data$type == 'A'
    
    # Prepare for Meta
    data$yi <- with(data, 1/age)
    data$vi <- with(data, (1/age - 1/(age + sd))^2)
    
    # Predictions
    predictions <- to_predict
    addin <- names(data)[!names(data) %in% names(predictions)]
    predictions <- merge(data[1, addin], predictions, all=TRUE)
    
    # 10B3 Model
    s <- data$modeled_in_10B3
    m <- model_10B3(data[s,])
    predictions$p_10B3 <- predict(m, newdata=predictions)
    
    # Meta Model
    s <- data$modeled_in_10B3
    m <- model_meta(data[s,])
    predictions$p_my_analysis <- predict_meta(m, newdata=predictions)
    data$tau2 <- m$tau2
    p_heterogeneity <- m$QEp
    
    # Merge
    data$p_my_analysis <- NA
    data$p_10B3 <- NA
    predictions$tau2 <- NA
    data <- rbind(data, predictions)

    # Show
    # Reproduce 10B3
    # http://www.nap.edu/openbook.php?record_id=11340&page=257
    g <- data[with(data, 
        strain == 'RFM' & 
        sex == 'F' &
        rate != 0.4
    ),]
    show(g)

plot of chunk unnamed-chunk-45

    ggsave_for_ppt('beir_10B3_meta_regression.png')

Results

The a/B ratio goes down a bit. Standard error bars are much larger than they were originally.

Heterogeneity is highly significant as measured by restricted maximum likelihood. More on that measurement from the metafor paper (http://www.jstatsoft.org/v36/i03/paper) and they cite

Q-test
Hedges LV, Olkin I (1985). Statistical Methods for Meta-Analysis. Academic Press, San Diego, CA.

REML
Viechtbauer W (2005). Bias and Eciency of Meta-Analytic Variance

Estimators in the Random-Eects Model." Journal of Educational and Behavioral Statistics, 30(3), 261-293.

^ back to table of contents


Meta Regression of BEIR 10B4

Last update: June 2013

BEIR VII estimates are based on ordinary linary regresssion of mean lifespans per group ignoring the fact that these means have a standard error.

This affects the likelihood estimate as an exact fit of the data is estimated as much more likely than a very near fit of the data. This becomes painfully obvious when we run the profile analysis on data containing only 3 groups in which case one particular curvature fits the data exactly and produces an estimate considered to be infinitely likely.

Here I will add standard error into the BEIR analysis both in the graphs and in the likelihood analysis. As before the data will come from storer 1979 (3575012.pdf).

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data <- read.csv('data/storer_1979.csv', sep='\t')  

# Helpers
model_10B4 <- function(data, o){    
    glm(
        I(1/age) ~ I(dose + o*dose^2 / (fractions)),
        data=data,
        weights=n
    )
}
model_meta <- function(data, o){    
    data$curved_dose <- with(data, 
        dose + o*dose^2 / (fractions)
    )
    rma(
        1/age, 
        (1/age - 1/(age + sd))^2, 
        mods = cbind(curved_dose), 
        data = data,
        method='ML'
    )
}
        
# Constants
threshold = 1.5001

# Clean
data$fractions <- 1
data$fractions[data$rate < 0.1] <- Inf

# Define Acute
chronic <- data$rate < 0.1
data$type <- 'A'
data$type[chronic] <- 'C'

# Subset
data$modeled_in_10B4 <- with(data, 
    dose < threshold &
    strain == 'RFM'  &
    sex == 'F' # &
    # rate != 0.4
)
data$in_my_analysis <- data$type == 'A'

# Model
low = -2
high = 6
delta = .01   # Reduce to increase resolution
o_range = (low/delta):(high/delta) * delta

get_likelihoods <- function(
    o_range, 
    modeling_function=model_10B4,
    d=data[data$modeled_in_10B4,]
){
    r <- ldply(o_range, function(o){
        m <- modeling_function(d, o)
        l = logLik(m)
        
        data.frame(o, l)
    })
    
    r$l <- normalize_likelihood(r$l, delta)
    
    r
}


beir_r <- get_likelihoods(o_range)
my_r <- get_likelihoods(o_range, model_meta)


# Reproduce 10B4
# http://www.nap.edu/openbook.php?record_id=11340&page=258
ggplot(beir_r, aes(o, l)) + 
    geom_path() + 
    geom_path(data=my_r, color='red') + 
    scale_y_continuous(breaks = c(0:5)/5, limits=c(0,1))

plot of chunk unnamed-chunk-46

ggsave_for_ppt('beir_10B4_meta_reression.png')    

Results

I was able to reproduce figure 10B4 near perfectly. Importantly it becomes nearly straight if we only include the acute exposures.

It could be that the shoulder is very narrow, though this would be surprising given what we know about tissue level effects.

It could also be that there is some systematic bias between chronic and high dose experiments. For instance the high dose rate was even higher than the experimenters thought that it was.

More reason for meta-analysis.

^ back to table of contents


Meta-regression on all data

Last update: June 2013

Show graphs like Storer 1979 10B3 but for all data using random effects meta-regression.

# TODO(ben) Why is this section failing?

# Common
setwd('~/janus')
source('scripts/util/util.R') # http://goo.gl/VYzkAs

# Data
data <- readRDS('data/ddref.rds')

Modeling functions

Specify modeling functions.

model_10B3 will fit a linear quadratic model exactly as in the BEIR VII report, without accounting for within or between group error.

model_10B3 <- function(data){
    glm(
        I(1/age) ~ dose*cluster + I(dose^2/fractions)*cluster,
        data=data,
        weights=n
    )
}

model_meta will fit an identical model except that within group and between group error will be accounted for.

model_meta <- function(data){   
  data$a <- data$dose
  data$B <- with(data, dose^2 / (fractions))
  
  rma(
    yi, 
    vi, 
    mods = ~ a*cluster + B*cluster -a -B -1, 
    data = head(data, 6),
    method="ML"
  )
}

        
# # Mean Lifespans
# aggregate <- ddply(data, .(cluster, group_id, sex), function(df){
#     u <- function(x) paste(unique(x), collapse=' ')
#     dont_aggregate <- c('lifespan', 'id', 'X', 'n')
#     n <- nrow(df)
#     data.frame(
#         llply(df[,!names(df) %in% dont_aggregate], u),
#         age=mean(df$lifespan),
#         sd=sd(df$lifespan)/n^0.5,
#         n=nrow(df)
#     )   
# })
# 
# # Prepare for Meta
# aggregate$yi <- with(aggregate, 1/age)
# aggregate$vi <- with(aggregate, (1/age - 1/(age + sd))^2)   
# 
# # Restore Sanity
# numerics <- names(data)[laply(data, is.numeric)]
# for(n in numerics) {
#     if(n %in% names(aggregate)){
#         aggregate[,n] <- as.numeric(as.character(aggregate[,n]))
#     }
# }
# 
# # Model
# m <- model_meta(aggregate)
# c <- coefficients(m)
# aggregate <- aggregate %>%
#   mutate(i=c[paste0('cluster', cluster)],
#          a=c[paste0('a:cluster', cluster)],
#          B=c[paste0('cluster', cluster, ':B')],
#          tau2=m$tau2)
# aggregate$p_my_analysis <- predict(m)$pred
# aggregate$p_10B3 <- predict(model_10B3(aggregate))
# 
# # Project
# # Add projections across the entire range (0-1.5 Gy) for
# # acute and protracted exposures.  This will create nicer
# # graphs.
# doses = seq(from=0, to=1.5, by=.1)
# coefficients = unique(aggregate[,c('i', 'a', 'B', 'cluster')])
# projections = ddply(coefficients, .(cluster), function(df){
#   acute <- with(df, cbind(data.frame(
#     dose = doses,
#     type = 'A',
#     fractions = 1,
#     p_my_analysis = i + a * doses + B * doses^2
#   ), df))
#   
#   chronic <- with(df, cbind(data.frame(
#     dose = doses,
#     type = 'C',
#     fractions = 1000,
#     p_my_analysis = i + a * doses
#   ), df))
#   
#   rbind(acute, chronic)
# })
# projections$p_10B3 <- predict(model_10B3(aggregate),
#                               newdata=projections)
# 
# # Show
# g <- aggregate
# f <- function(x) round(x, 2)
# g$cluster <- with(g, paste0(
#     cluster, '\n',
#     #'a=', f(a), 
#     #' B=', f(B), 
#     ' ddref =', f((a + B) / a)
# ))
# projections$cluster <- projections$cluster
# projections$cluster <- with(projections, paste0(
#     cluster, '\n',
#     #'a=', f(a), 
#     #' B=', f(B), 
#     ' ddref =', f((a + B) / a)
# ))
# 
# ggplot(g, aes(
#     dose, 
#     1/age, 
#     label=type,
#     group=type)) + 
#     geom_errorbar(aes(
#         ymin=1/age + (vi + tau2)^0.5, 
#         ymax=1/age - (vi + tau2)^0.5
#         ), alpha=0.5, width=.05, color='red') +
#     geom_errorbar(aes(
#         ymin=1/age + vi^0.5, 
#         ymax=1/age - vi^0.5
#     ), alpha=0.5, width=.1) +
#     geom_text(size=4) + 
#     geom_smooth(
#         data = projections,
#         aes(dose, p_10B3), 
#         method='lm', 
#         formula='y ~ x + I(x^2)',
#         se=FALSE,
#         color='black'
#     ) + 
#     geom_smooth(
#         data = projections,
#         aes(dose, p_my_analysis), 
#         method='lm', 
#         formula='y ~ x + I(x^2)',
#         se=FALSE,
#         color='red'
#     ) + 
#     facet_wrap(~ cluster, scales="free_y")  
# 
# 
# ggsave_for_ppt('meta_regression.png')

Results

Curves don’t change that radically, though standard errors often change rather dramatically!

^ back to table of contents


Meta Regression profiles on all data

Last update: June 2013

Show graphs like Storer 1979 10B4 but for all data using the random effects meta regression.

# TODO(ben) why isn't this working?

# # Common
# setwd('~/janus')
# source('scripts/util/util.R') # http://goo.gl/VYzkAs
# 
# # Data
# data <- readRDS('data/ddref.rds')
#  
# # Helpers
# model_10B4 <- function(data, o){    
#     glm(
#         I(1/age) ~ cluster * I(dose + o*dose^2 / (fractions)),
#         data=data,
#         weights=n
#     )
# }   
# model_meta <- function(data, o){    
#     data$curved_dose <- with(data, dose + o*dose^2 / (fractions))
#     rma(
#         yi, 
#         vi, 
#         mods = cbind(cluster, curved_dose), 
#         data = data,
#         method="ML"
#     )
# }
# 
# 
# # Define Acute
# chronic <- data$fractions > 1
# data$type <- 'A'
# data$type[chronic] <- 'C'
#         
# # Mean Lifespans
# aggregate <- ddply(data, .(cluster, group_id), function(df){
#     u <- function(x) paste(unique(x), collapse=' ')
#     dont_aggregate <- c('lifespan', 'id', 'X', 'n')
#     n <- nrow(df)
#     data.frame(
#         llply(df[,!names(df) %in% dont_aggregate], u),
#         age=mean(df$lifespan),
#         sd=sd(df$lifespan)/n^0.5,
#         n=nrow(df)
#     )   
# })
# 
# # Prepare for Meta
# aggregate$yi <- with(aggregate, 1/age)
# aggregate$vi <- with(aggregate, (1/age - 1/(age + sd))^2)   
# 
# # Restore Sanity
# numerics <- names(data)[laply(data, is.numeric)]
# for(n in numerics) {
#     if(n %in% names(aggregate)){
#         aggregate[,n] <- as.numeric(as.character(aggregate[,n]))
#     }
# }
# 
# # Model
# low = -2
# high = 6
# delta = .1   # Reduce to increase resolution
# o_range = (low/delta):(high/delta) * delta
# 
# # Model
# summary <- ldply(o_range, function(o){
#       # BEIR
#       m <- model_10B4(aggregate, o)
#       l_10B4 <- logLik(m)
#       
#       # Random Effects
#       m <- model_meta(aggregate, o)
#       l_meta <- logLik(m)
#       
#       c(l_10B4=l_10B4, l_meta=l_meta, o=o)
# })
# summary <- summary %>% 
#   mutate(l_10B4 = normalize_likelihood(l_10B4, delta),
#          l_meta = normalize_likelihood(l_meta, delta))
# 
# # Normalize
# aggregate <- ddply(aggregate, .(cluster, sex), function(df){
#     df$l_10B4 <- normalize_likelihood(df$l_10B4, delta)
#     df$l_meta <- normalize_likelihood(df$l_meta, delta)
#     df
# })
# 
# 
# # Show
# # As in 10B4
# # http://www.nap.edu/openbook.php?record_id=11340&page=257
# a <- aggregate
# 
# show <- function(g){
#     g$cluster <- as.factor(as.character(g$cluster))
#     g$l_10B4 <- pmin(g$l_10B4, 1)
#     g$l_meta <- pmin(g$l_meta, 1)
#     suppressWarnings(print(
#     ggplot(g, aes(o, l)) + 
#         geom_path(aes(o, l_10B4), color='black') +
#         geom_path(aes(o, l_meta), color='red') +
#         facet_wrap(~ cluster) + scale_y_log10()
#     ))
# }
# 
# g <- a
# show(g)
# 
# 
# ggsave_for_ppt('meta_regression_profile.png')
# 
# summary$l_10B4 <- pmin(4, summary$l_10B4)
# summary$l_meta <- pmin(4, summary$l_meta)
# ggplot(summary, aes(o, l)) + 
#     geom_path(aes(o, l_meta))
#     ylim(0,4)
# ggsave_for_ppt('meta_regression_summary_effect.png')
# 
# 
# # TODO: Maybe I would not get absurd profile-likelihood values if I just perfomed a single regression, instead of multiple regressions with bayesian updates.  Concretely I would do 1/age ~ cluster * ...

Results

We still seem to have biased likelihood estimates, but things are improving a bit…

^ back to table of contents


To log or not to log?

It is strange that dose response in cell systems is linear quadratic for chromosomal abberations and log linear quadratic for cell survival. Concretely:

chromosomal abberations ~ a * Dose + B * Dose^2

TODO: add figure pic from thesis

whereas

log(survival) ~ a * Dose + B * Dose^2

TODO: add a pic of cell survival

I propose that the dose response is actually always log linear quadratic, but that for rare outcomes the distinction does not matter and a regular linear quadratic response will fit the data quite well.

To prove that this could be true I will develop a little fake survival data set and show that when viability is high, either formula fits the data, whereas when viability is low only the log linear quadratic formula fits the data.

TODO: ask Gayle or Little about this TODO: look in Hall and BEIR, why do they say that ERR has inward curvature?

# Fake data
n <- 200
a <- 0.001
B <- 0.0001
data <- data.frame(dose=0:n) %>% 
  mutate(viability = exp(-a*dose - B*dose^2))

# Wrapper function for plotting
show <- function(data) {
  ggplot(data, aes(dose, viability)) +
    geom_point(size=0.5) +
    geom_smooth(method='lm', 
                formula='y ~ x + I(x^2)',
                se=FALSE)
}
  

# Log scale
show(data) + scale_y_log10()

plot of chunk unnamed-chunk-51

# Normal scale
show(data)

plot of chunk unnamed-chunk-51

# Normal scale where viability is high
show(data %>% filter(viability > 0.70))

plot of chunk unnamed-chunk-51

Figures: Linear quadratic model fits In each figure dose (x axis) is plotted against viability (y axis).
Black dots represent hypothetical data points fit to a log linear quadratic model without error. The blue line represents a linear quadratic fit to the data. In the first figure, the data set is plotted with viability on a log scale. The second figure is identical, but viability is plotted on a linear scale. The third figure is identical to the second, on a linear scale, but the data is truncated to only include outcomes with high viability.

Results This hypothetical data proves the point, log linear quadratic models and linear quadratic models are nearly indistiguishiable when viablity is high, but only the log linear quadratic model can fit the data well when viablity extends into the low range.

The implication of this finding is that log linear quadratic models is always valid and is necessary if the outcome of interest affects a large percentage of the population (the cutoff is arbitrary, but the author might suggest > 30%). This further suggests that we need not create a threshold of 1.5 Gy or 2 Gy when assessing DDREF, instead we must fit results to log linear quadratic scales and avoid a cutoff entirely.

How could the literature be wrong If we read up in Hall, we find that the proposed cutoff is controversial, Gray proposed the bell shaped dose response in the 1960s to explain leukemia induction in mice which seemed to diminish at high doses. He proposed that this is because the cells that were damaged and would have lead to cancer had become inviable instead. Therefore some optimal dose results in maximum cancer induction by damaging many cells without inducing so much damage that they become inviable.

However, subsequent data has not born out this theory in other systems like breast cancer (Brehner and Sachs) in children as St. Jude’s treated for leukemia (Neglia 2006).

[][http://dl.dropbox.com/u/1131693/bloodrop/Screenshot%202014-05-07%2012.21.33.png]

Figures from Hall These figures, copied from Hall’s textbook illustrate Gray’s proposal and counter evidence from Brehner and Sachs.

Conclusion We would do well to use log linear quadratic models and avoid the tacit assumption that there is some threshold dose beyond which cancer rates (or mortality rates) fall.

TODO: Try estimating DDREF without a cutoff TODO: See if the hypotherical threshold proposed by BEIR VII actually plays out (figure 10-1)

^ back to table of contents